Closed patelprateek closed 2 years ago
No, binary fuse and xor filters are not the same: See https://arxiv.org/abs/2201.01174
what would be the fastest way to construct the binary_fuse filter
I'm afraid I don't understand the question, as there is only one way to construct a binary_fuse filter.
what would be the fastest filter
binary_fuse8.
@thomasmueller Thanks for the quick response. When i run benchmarking on some machines i oobserve
XorBinaryFuse8
XorBinaryFuse16
XorBinaryFuse8-4wise
XorBinaryFuse16-4wise
Could you please please point me to some documentation on what is the difference.
Also from benchmarking results I am quite impressed to see that blockbloom is almost 10x faster (6 ns vs 60 ns for construction and 1.4x faster for membership query) for constructing a filter when adding 10 millions of keys but at a higher false positive rate (0.9% vs 0.39%) but uses almost same number of bits (10 bits per value) , is that expected ? It seems like blockedbloom may be the optimal choice since filter construction and search are bottleneck and we can tolerate a bit higher false positive ?
add remove 0% 25% 50% 75% 100% 3*find ε% bits/item bits/item space% keys
Bloom12-addAll 26.21 0.00 27.84 31.43 29.98 29.72 28.87 114.91 0.3145 12.00 8.31 44.4 10.000
Bloom16-addAll 39.68 0.00 30.26 41.40 43.04 46.29 49.47 165.95 0.0457 16.00 11.10 44.2 10.000
BlockedBloom 6.11 0.00 7.76 7.41 7.40 7.61 8.10 29.08 0.9373 10.67 6.74 58.3 10.000
XorBinaryFuse8 59.27 0.00 10.85 10.51 10.43 10.55 10.36 90.90 0.3898 9.02 8.00 12.7 10.000
XorBinaryFuse16 59.60 0.00 16.98 16.82 16.86 16.96 16.81 110.25 0.0015 18.04 16.03 12.5 10.000
XorBinaryFuse8-4wise 55.14 0.00 12.75 12.47 12.46 12.43 13.53 93.33 0.3932 8.61 7.99 7.8 10.000
XorBinaryFuse16-4wise 55.16 0.00 20.24 20.08 20.43 19.64 21.83 116.48 0.0015 17.22 16.05 7.3 10.000
See https://arxiv.org/abs/2201.01174
The -4wise implementations are 4wise, the others are 3wise.
I was going through some benchmarking results and for intel did not find binary_fuse8 and binary_fuse16 variants . Are they same as Xor8 and Xor16 ? One of my use case is adding 1Mill - 20-30 mill keys to the filter , what would be the fastest way to construct the binary_fuse filter and what would be the fastest filter recommended if the bottleneck seems to be filter creation ?