FastFilter / fastfilter_cpp

Fast Approximate Membership Filters (C++)
Apache License 2.0
261 stars 24 forks source link

binary_fuse benchmarks (questions) #30

Closed patelprateek closed 2 years ago

patelprateek commented 2 years ago

I was going through some benchmarking results and for intel did not find binary_fuse8 and binary_fuse16 variants . Are they same as Xor8 and Xor16 ? One of my use case is adding 1Mill - 20-30 mill keys to the filter , what would be the fastest way to construct the binary_fuse filter and what would be the fastest filter recommended if the bottleneck seems to be filter creation ?

thomasmueller commented 2 years ago

No, binary fuse and xor filters are not the same: See https://arxiv.org/abs/2201.01174

what would be the fastest way to construct the binary_fuse filter

I'm afraid I don't understand the question, as there is only one way to construct a binary_fuse filter.

what would be the fastest filter

binary_fuse8.

patelprateek commented 2 years ago

@thomasmueller Thanks for the quick response. When i run benchmarking on some machines i oobserve XorBinaryFuse8
XorBinaryFuse16
XorBinaryFuse8-4wise
XorBinaryFuse16-4wise Could you please please point me to some documentation on what is the difference.

Also from benchmarking results I am quite impressed to see that blockbloom is almost 10x faster (6 ns vs 60 ns for construction and 1.4x faster for membership query) for constructing a filter when adding 10 millions of keys but at a higher false positive rate (0.9% vs 0.39%) but uses almost same number of bits (10 bits per value) , is that expected ? It seems like blockedbloom may be the optimal choice since filter construction and search are bottleneck and we can tolerate a bit higher false positive ?

                                     add  remove      0%     25%     50%     75%    100%  3*find      ε%  bits/item  bits/item  space%    keys
                  Bloom12-addAll   26.21    0.00   27.84   31.43   29.98   29.72   28.87  114.91  0.3145      12.00       8.31    44.4  10.000

                  Bloom16-addAll   39.68    0.00   30.26   41.40   43.04   46.29   49.47  165.95  0.0457      16.00      11.10    44.2  10.000

                    BlockedBloom    6.11    0.00    7.76    7.41    7.40    7.61    8.10   29.08  0.9373      10.67       6.74    58.3  10.000

                  XorBinaryFuse8   59.27    0.00   10.85   10.51   10.43   10.55   10.36   90.90  0.3898       9.02       8.00    12.7  10.000

                 XorBinaryFuse16   59.60    0.00   16.98   16.82   16.86   16.96   16.81  110.25  0.0015      18.04      16.03    12.5  10.000

            XorBinaryFuse8-4wise   55.14    0.00   12.75   12.47   12.46   12.43   13.53   93.33  0.3932       8.61       7.99     7.8  10.000

           XorBinaryFuse16-4wise   55.16    0.00   20.24   20.08   20.43   19.64   21.83  116.48  0.0015      17.22      16.05     7.3  10.000
thomasmueller commented 2 years ago

See https://arxiv.org/abs/2201.01174

The -4wise implementations are 4wise, the others are 3wise.