FastFilter / fastfilter_cpp

Fast Approximate Membership Filters (C++)
Apache License 2.0
261 stars 24 forks source link

Simple binary fuse filter creation #28

Closed BaiqingL closed 2 years ago

BaiqingL commented 2 years ago

I am new to C++ and want to use the implementation of binary fuse filters in this repo for a simple project.

I am having trouble understanding how to properly create and query a binary fuse 4 wise low mem filter.

Can someone please provide a code snippet example? For example, I have a vector containing some int64_t values and I want to create a xorbinaryfusefilter_lowmem4wise::XorBinaryFuseFilter. What should I do in this case?

Thanks for all the help

lemire commented 2 years ago

It is a research/demonstration library and thus it is not optimized for ease-of-use. You might prefer our single-header C library, as stated in our README. Nevertheless, you can use our fastfilter_cpp code in your projects if you'd like. The usage might go like this...

vector<uint64_t> myinput; // your keys
uint64_t key =... // some query key

// construct a 3-wise filter with the desired size (size is the number of entries, it cannot be changed afterward)
xorbinaryfusefilter_lowmem::XorBinaryFuseFilter<uint64_t, uint8_t> filter(size);
// could also use 4-wise by replacing the above with
// xorbinaryfusefilter_lowmem4wise::XorBinaryFuseFilter<uint64_t, uint8_t> filter(size);

// add your data (do this once, must provide 'size' values)
filter.AddAll(myinput,  0, size); 
// the calling convention of AddAll is a bit awkward, it expects a std::vector with a beginning and end index

// then you can test by calling 'Contain'. If the result is 0, then the key probably present.
// again, that's a bit awkward.
if(filter.Contain(key) == 0) {
  // the key is probably present
}
BaiqingL commented 2 years ago

Ah, thank you so much for the example. I was a bit caught up on the AddAll function since I didn't know what start and end was. This helps a lot!

BaiqingL commented 2 years ago

For the C library, are the optimizations contained in lowmem version also included in the single-header library?

lemire commented 2 years ago

The C library should have similar performance. If the performance is not similar, then that would be a bug.

BaiqingL commented 2 years ago

Awesome, I was trying to run your provided example and ran into some problems, as the code is stuck on the AddAll operation, I have two clarification questions:

  1. Should table.Contain(key) be filter.Contain(key)?
  2. Is the size variable the size of the vector, (i.e. how many element exists), or the sizeof the vector?
BaiqingL commented 2 years ago

My example looks like this:

  // Test the construction and usage of 4wise_xor_binary_fuse_filter_lowmem
  std::vector<uint64_t> key_space = {2000000000000000000, 3000000000000000000, 4000000000000000000};

  uint64_t exist_key = 2000000000000000000;
  uint64_t not_exist_key = 1000000000000000000;

  xorbinaryfusefilter_lowmem4wise::XorBinaryFuseFilter<uint64_t, uint8_t> filter(sizeof(key_space));
  cout << "Filter constructed" << endl;

  filter.AddAll(key_space,  0, key_space.size()); 
  cout << "Filter populated" << endl;

  // If contain returns 0, key exists, or else it doesn't exist
  if (filter.Contain(exist_key) == 0) {
  // the key is probably present
  cout << "key is present" << endl;
  }

  if (filter.Contain(not_exist_key) == 1) {
      // the key is probably not present
      cout << "key is not present" << endl;
  }
lemire commented 2 years ago

Try:

  // Test the construction and usage of 4wise_xor_binary_fuse_filter_lowmem
  std::vector<uint64_t> key_space = {2000000000000000000, 3000000000000000000, 4000000000000000000};

  uint64_t exist_key = 2000000000000000000;
  uint64_t not_exist_key = 1000000000000000000;

  xorbinaryfusefilter_lowmem4wise::XorBinaryFuseFilter<uint64_t, uint8_t> filter(key_space.size());
  cout << "Filter constructed" << endl;

  filter.AddAll(key_space,  0, key_space.size()); 
  cout << "Filter populated" << endl;

  // If contain returns 0, key exists, or else it doesn't exist
  if (filter.Contain(exist_key) == 0) {
  // the key is probably present
  cout << "key is present" << endl;
  }

  if (filter.Contain(not_exist_key) == 1) {
      // the key is probably not present
      cout << "key is not present" << endl;
  }