Closed lemire closed 4 years ago
cc @thomasmueller @solardiz
Your analysis looks correct to me.
My suggestion that "using non-overlapping bitmasks instead of the reduce trick would have avoided this issue" was only for the case when 32 bits are still enough (as I had mentioned, for filters of up to 4 gigabits in size).
I have an easy fix coming up.
Fixed by moving to 64-bit:
https://github.com/FastFilter/fastfilter_cpp/commit/f9c32cfa7726d5206ead33be5de1df43ef254fc5
For large inputs, the false positive rate increases... At 500M we can consider it broken. As far as I can tell, it is not an overflow issue but rather a hashing issue.
I expect that the problem is explained by this comment...
We have that a is a 32-bit value. If we have 500M inputs, then arrayLength should be about 100M or 2**27. So, roughly speaking reduce(a, this->arrayLength) uses the top 27 bits... while getBit(a) needs 6 bits. 27+6 = 33... We are exceeding the 32 bit input.