RedisBloom / redisbloom-py

Python client for Redisbloom
https://redisbloom.io
BSD 3-Clause "New" or "Revised" License
76 stars 11 forks source link

Why 'cfAddNX' so many misjudgement? #50

Open scallionshen opened 2 years ago

scallionshen commented 2 years ago

I am doing some test of Cuckoo Filter with data set of range(1, 1e8)

redis_cuckoo.cfCreate(CK_FID, 100000000, bucket_size=10, ) cnt = 0 duplicate_cnt = 0 for i in range(100000000): exist = not bool(redis_cuckoo.cfAddNX(CK_FID, i)) cnt += 1 if cnt % 10000 == 0: print(cnt, duplicate_cnt) if exist: duplicate_cnt += 1

After nearly 1000k elements was added , error count was about ~22k, which make no sense for me . can anybody give me some guidance. TYVM!

ashtul commented 2 years ago

Hi @scallionshen, I have tested the same parameters and got 230 duplicates at 1M, 23k at 10M and 93k at 20M. Please note, for you backet size, the error rate of the filter when almost full will be almost 10%. The fingerprint has 255 different values and each element would have 2 locations * 10 buckets elements to try and match.

BTW, why won't you use our Bloom Filter since you are not looking to delete values?