DaveAKing / guava-libraries

Automatically exported from code.google.com/p/guava-libraries
Apache License 2.0
0 stars 0 forks source link

bloom filter negative expectedFpp and premature saturation of filter #1656

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Hi folks,

We've observed that for very large bloom filters the expected false positive 
probability returns a negative value and the observed false positive 
probability exceeds the desired false positive probability  during conditions 
when the number of performed insertions is relatively small as compared to the 
number of expected insertions.

I've been able to reproduce this behavior in the attached SSCCE. The bloom 
filter is constructed with an expected 150M insertions and a desired fpp of 
0.0001. After 75M insertions the expected false positive probability is a 
negative value. The observed false positive probability is around 0.0002. 
Informally we have been able to reproduce the negative expected false positive 
probability with larger  desired false positive probabilities by increasing the 
expected number of insertions.

Original issue reported on code.google.com by spie...@addthis.com on 5 Feb 2014 at 4:07

Attachments:

GoogleCodeExporter commented 9 years ago

Original comment by cgdecker@google.com on 5 Feb 2014 at 5:30

GoogleCodeExporter commented 9 years ago
This seems to be related to 1119 (if not the same problem). Quiet some things 
still use int which gladly overflows, e.g., 
BloomFilterStrategies.BitArray.bitSize().

Original comment by Maaarti...@gmail.com on 7 Feb 2014 at 8:32

GoogleCodeExporter commented 9 years ago
The overflows should be fixed by 
https://code.google.com/p/guava-libraries/source/detail?r=6de004aebed935dfa060e7
6e1c2db23041ae3df5

The other issues (wrong FPP) is being addressed in Issue #1119.

Thanks for the report, and especially thanks for the GuavaBloomTest.java...it 
was helpful in confirming our fix.

Original comment by kak@google.com on 28 Mar 2014 at 7:43

GoogleCodeExporter commented 9 years ago
This issue has been migrated to GitHub.

It can be found at https://github.com/google/guava/issues/<issue id>

Original comment by cgdecker@google.com on 1 Nov 2014 at 4:10

GoogleCodeExporter commented 9 years ago

Original comment by cgdecker@google.com on 3 Nov 2014 at 9:07