FastFilter / fastfilter_java

Fast Approximate Membership Filters (Java)
Apache License 2.0
238 stars 27 forks source link

Fuse Filters #21

Closed thomasmueller closed 3 years ago

thomasmueller commented 4 years ago

Fuse filters should be implemented.

See also https://github.com/FastFilter/xorfilter/issues/8

thomasmueller commented 4 years ago

Here the graph of the required load where mapping success probability is > 0.9:

fuseLoadWithP90

The load of the regular xor filter is at 0.813, so the fuse filter need less space than xor filters if the set size is larger than around 76'000. Comparison against xor+ is a bit more complicated. Next we need to evaluate lookup speed (best with the C / C++ versions).

lemire commented 4 years ago

@thomasmueller I benchmarked the Go version for large files, and my preliminary conclusion is that the fused approach is slower at first but then becomes about just as fast as the regular xor filter as you grow the data size. There may also be room for optimization.

thomasmueller commented 4 years ago

Here a graph of the required load where mapping success probability is > 0.6. Size below 100 is also listed, where it's really weird.

Screenshot 2020-01-19 at 15 48 48
lemire commented 4 years ago

These are nice plots.

My own observation is that 4096 + size / 0.879 bytes should work pretty well.

Suppose that the filter is large, so that the constant (4096) goes away.

We have that Xor+ filters require 1.0824k + 0.5125 bits per entry, the Xor filters require 1.23 k bits per entry and the fuse filters require 1.14 k bits per entry. For k = 8, roughly speaking, the Xor+ filters and the fuse filters will use the same amount of memory... 9.10 bits for fuse versus 9.17 for Xor+ (this is less than 1% in differene). When going to 16 bits per fingerprint, the story shifts and you get 17.8 for xor+ filters versus 18.2 for fuse filters (about 2% in favor of xor+)

lemire commented 4 years ago

Note: it may look like 1024+size/0.879 would do, but for some small inputs, you get very high rates of failure. So it is clearly not sufficient.

thomasmueller commented 4 years ago

For small sets, by default, I would let construction to be slow. With a 20% probability, and a set of 100, it doesn't take terribly long to retry construct with a different seed, until it works. Sure, construction is no longer expected to succeed very quickly, but for many use cases I think that's fine. I think it's often better than increasing the size so much that construction is fast.

lemire commented 4 years ago

With a small constant, I saw some unlikely cases where the failure rate was 98%. In some of the code, I stop after 64 iterations. With a 98% failure rate, there is a fairly good chance that you are going to need more than 64 iterations.

Also, we don't have perfect randomness... if you put the capacity too low, you could, in principle, get a scenario where, for some keys, it is just not possible to construct the filter (say after 256 iterations). Now, I don't know that this is actually possible, but if we think it is not possible, then we need to prove it somehow. It might be hard.

So I think you don't want to play with very low success rates. Not in production.

A 50% or 60% probability is fine, I agree... but you don't want to go too low under it.

thomasmueller commented 4 years ago

Hm, currently we have a hardcoded segment count, and a variable segment length... shouldn't we rather have a hardcoded segment length, and a variable segment count? It should also speed up lookup a tiny bit, as we have less multiplications, if we set segment length a power of 2. Right now we have:

    long hash = Hash.hash64(key, seed);
    int f = fingerprint(hash);
    int r0 = (int) ((0xBF58476D1CE4E5B9L * hash) >> 32);
    int r1 = (int) hash;
    int r2 = (int) Long.rotateLeft(hash, 21);
    int r3 = (int) Long.rotateLeft(hash, 42);
    int seg = Hash.reduce(r0, FUSE_SEGMENT_COUNT);
    int h0 = (seg + 0) * segmentLength + Hash.reduce(r1, segmentLength);
    int h1 = (seg + 1) * segmentLength + Hash.reduce(r2, segmentLength);
    int h2 = (seg + 2) * segmentLength + Hash.reduce(r3, segmentLength);
    f ^= fingerprints[h0] ^ fingerprints[h1] ^ fingerprints[h2];
    return (f & 0xff) == 0;

We could use a bit mask instead of Hash.reduce(rX, segmentLength)

thomasmueller commented 4 years ago

Sorry, hardcoded segment length doesn't work... but having both segment length and segment count to be variable (parameters), and segmentLength to be a power of 2.

lemire commented 4 years ago

@thomasmueller It sounds promising. If we could reduce the instruction count and latency of the "contain" check, it might go a long way toward getting a really clean result.

thomasmueller commented 4 years ago

Also, for small sets, using a segment count lower than 100 seems to work well (p is probability to find a solution in the first try):

size 100, segmentCount 8, p 0.95245, arrayLength 240, bits/key 19 size 100, segmentCount 16, p 0.79015, arrayLength 204, bits/key 16 size 792, segmentCount 16, p 0.919531126247426, arrayLength 1152, bits/key 11 size 1694, segmentCount 32, p 0.5163476198543113, arrayLength 2176, bits/key 10

It looks promising to me (more tests are needed).

prakol16 commented 4 years ago

On this topic, we had something related to this as our final project for a class: https://prakol16.github.io/CS166-final-project-bloomier-filters/ (see part 2 especially). Don't know if it will help, but I thought I might as well share.

lemire commented 4 years ago

@prakol16 Thank you for the pointer.

thomasmueller commented 3 years ago

I did some work on this here: https://github.com/FastFilter/fastfilter_java/commit/9e95adba4af3c628ad63191d9f48385e20ee8dfe I think we can close the issue for now. At some point we should probably add a 16-bit variant as well.