FastFilter / fastfilter_java

Fast Approximate Membership Filters (Java)
Apache License 2.0
238 stars 27 forks source link

Infinite loop in XOR filter construction #29

Closed steve-scalyr closed 2 years ago

steve-scalyr commented 2 years ago

The following invocation leads to an infinite loop:

    XorPlus8.construct(new long[]{0xef9bddc5166c081cL, 0x33bf87adaa46dcfcL});

It appears that with these inputs, getHash(key, seed, index) always returns index. As a result, the two values always collide on all three of their table slots (regardless of the seed), and hence there are no single-occupied table slots and the construction algorithm never terminates. I am scratching my head at this; it seems impossible that the hash algorithm could behave in such a fashion regardless of seed. But this is the behavior I've observed. Any thoughts on why this might be?

Regardless of exactly what is happening here, for safety purposes, it would seem like a good idea to implement a maximum retry count for table construction (after which an exception would be thrown).

I will try to dig into this further. If I'm able to determine what is happening and implement a fix, would you be open to a pull request?

lemire commented 2 years ago

I did not write this code so I will let Thomas answer but the issue might be with the set size.

thomasmueller commented 2 years ago

Oh, this is embarrassing... I forgot to add the 32 entries offset that is necessary for xor filters! It is fixed now.

steve-scalyr commented 2 years ago

Ah, of course: blockLength was just 1, thus forcing the observed behavior from getHash. Thanks for the quick turnaround!

lemire commented 2 years ago

You’d still prefer if infinite loops were provably impossible.

steve-scalyr commented 2 years ago

As a better-than-nothing fallback, what would you think of throwing an exception after some number of attempts to populate the table? Then the client could at least take some action to compensate (e.g. don't create an XOR filter for this particular data set, and instead assume that all possible values might be present). It would also be an opportunity to provide a diagnostic message.

I can put together a pull request, if you like.

lemire commented 2 years ago

It is not my code but I would think it wise to do so.

steve-scalyr commented 2 years ago

Done in https://github.com/FastFilter/fastfilter_java/pull/30.