FastFilter / fastfilter_cpp

Fast Approximate Membership Filters (C++)
Apache License 2.0
261 stars 24 forks source link

bloomfilter::BloomFilter has false positives for large inputs #7

Closed lemire closed 4 years ago

lemire commented 4 years ago

The Bloom filter implementation is buggy. For large inputs, there are false positives.

#include <sstream>

#include "bloom.h"

int main(int argc, char **argv) {
  size_t array_size = 500 * 1000 * 1000;
  uint64_t *array = new uint64_t[array_size];
  for (size_t i = 0; i < array_size; i++)
    array[i] = i;
  using Table = bloomfilter::BloomFilter<uint64_t, 12, false, SimpleMixSplit>;
  Table table(array_size);
  table.AddAll(array, 0, array_size);
  printf("Checking for false negatives\n");
  for (size_t i = 0; i < array_size; i++) {
    if (table.Contain(array[i]) != bloomfilter::Ok) {
      printf("Detected a false negative. You probably have a bug. Aborting.\n");
      return EXIT_FAILURE;
    }
  }
  printf("Verified with success: no false negatives\n");
}
lemire commented 4 years ago

The issue was an overflow in AddAll.

I am not adding a check because checking for such an overflow is quite expensive.