Bloom filter error rate is inaccurate

RedisBloom / RedisBloom

Probabilistic Datatypes Module for Redis

https://redis.io/docs/stack/bloom/

Other

1.61k stars 252 forks source link

Bloom filter error rate is inaccurate #717

Closed llh12345 closed 3 months ago

llh12345 commented 6 months ago

I used bf.reserve to create a bloom filter and found that the error rate was half the error rate I specified. When the capacity of the Bloom filter is large, memory consumption will increase due to the above reasons.

llh12345 commented 6 months ago

I think the real error rate of the bloom filter actually created is the product of the user input error rate and ERROR_TIGHTENING_RATIO

LiorKogan commented 6 months ago

See explanation of why it is so: https://github.com/RedisBloom/RedisBloom/issues/716#issuecomment-1870198481

@llh12345 to claim that ERROR_TIGHTENING_RATIO can be increased while maintaining the desired probability of false positives (error_rate argument of BF.RESERVE) you will have to prove it (either theoretically or empirically).

ashtul commented 5 months ago

@llh12345 the reason you see half the error rate is due to the fact RedisBloom is a Scalable Bloom Filter. In order to allow the filter to grow in size without compromising the error rate, it is initilized with half the error rate specified. If you don't expect your filter to grow you can use NONSCALING when you call BF.RESERVE. You can read more about it here.