efficient / cuckoofilter

Other
970 stars 169 forks source link

[Question]false positive rate #36

Open MainHanzo opened 6 years ago

MainHanzo commented 6 years ago

Hello everyone, i am doing some research on the performance of the bloom filter, cuckoo filter and simd-block etc. And when i tried to compare the performance i found that the false positive rate of the cuckoo filter is not mentioned. So I would like to ask how much is the false positive rate of the cuckoo filter, how is the fp rate defined?

dnbaker commented 6 years ago

I recommend looking at the manuscript.

The key difference regarding false positive rates is that bloom filters' false positive rate approaches 1 as a function of the number of elements, while a cuckoo filter's error rate caps out at a relatively small rate (often 1-5%) regardless of how full the structure becomes. This comes at the cost of insertion increasing as the structure becomes more full.

The other disadvantage is that sketches cannot be merged or compared as concise representations of sets the way a bloom filter can, which eliminates a major use case for them.

jbapple-cloudera commented 6 years ago

@MainHanzo , when you run bulk-insert-and-query.exe, the false positive rate is the column labeled "ε". When you run conext-table3.exe, the false positive rate is in the row labeled "false positive rate".

MainHanzo commented 6 years ago

@jbapple-cloudera @dnbaker Thank you for all your help. @dnbaker However, I didn't get the meaning of"the sketches", do they refer to the cuckoo filters? and "concise representation of sets the way a bloom filter can". May i ask you explain more of that? @jbapple-cloudera I have another question: If the fp rate are changing when the elements inserted into the filter, what is the man-set fp rate for? Is that just the fp rate for the very beginning? I'm sorry, this is from a very beginner. Thank you very much for your patience.

dnbaker commented 6 years ago

@MainHanzo Correct. Both cuckoo and bloom filters are sketch data structures, and a filled sketch data structure is often referred to as a sketch.

Both bloom filters and cuckoo filters provide inexact set membership queries. However, bloom filters can approximate set operations by using bitwise & and |s, while merging cuckoo filters is much more expensive.