addthis / stream-lib

Stream summarizer and cardinality estimator.
Apache License 2.0
2.26k stars 558 forks source link

Add intersect method to bloom filter #136

Closed ghost closed 7 years ago

ghost commented 7 years ago

Per https://en.wikipedia.org/wiki/Bloom_filter#Interesting_properties , it is valid to construct a new bloom filter from a set of constituent bloom filters by applying an 'AND' operation to the constituent bits sets.

"... The intersect operation satisfies a weaker property: the false positive probability in the resulting Bloom filter is at most the false-positive probability in one of the constituent Bloom filters, but may be larger than the false positive probability in the Bloom filter created from scratch using the intersection of the two sets."

This PR adds a method to support this operation, and a unit test to verify.

abramsm commented 7 years ago

@alstaplesmoore thanks. this looks good to me. Can you add another test that tests a larger intersection to ensure it works with non-trivial amounts of data?

abramsm commented 7 years ago

PR https://github.com/addthis/stream-lib/pull/134 may have more tests that we can use.