DiceDB / dice

DiceDB is a redis-compliant, reactive, scalable, highly-available, unified cache optimized for modern hardware.
https://dicedb.io/
Other
6.83k stars 1.08k forks source link

Inconsistent `BF.ADD`: modify the implementation of bloom filters to achieve similar result as redis #1287

Open shashi-sah2003 opened 1 week ago

shashi-sah2003 commented 1 week ago

Steps to reproduce

  1. bf.add bf1 item1
  2. bf.info bf1

Expected output

The expected output when the above set of commands when run on Redis

image

Observed output

The observed output when the above set of commands when run on DiceDB

image

The steps to run the test cases are mentioned in the README of the dice-tests repository.

Expectations for resolution

This issue will be considered resolved when the following things are done

  1. changes in the dice code to meet the expected behavior
  2. Successful run of the tcl test behavior

You can find the tests under the tests directory of the dice repository and the steps to run are in the README file. Refer to the following links to set up DiceDB and Redis 7.2.5 locally

shashi-sah2003 commented 1 week ago

@JyotinderSingh @apoorvyadav1111 I am going to work on this. pls assign thanks

shashi-sah2003 commented 1 week ago

image image

@JyotinderSingh @apoorvyadav1111 from above you can see in bloom.go the value for capacity has been set to 1024 and number of filters has been set to number of hash function explicitly. I believe in redis implementation they use only on filter but in dicedb implementation number of filters are decided according to number of bits per element. Should I go ahead and change this configuration to use only one filter with 100 capacity?

apoorvyadav1111 commented 3 days ago

@shashi-sah2003 , yes please go ahead. Please consider that while we aim at being redis-compliant, finer values can differ based on the implementation of the data structure. That said, bloom filters do need an upgrade especially in expansion (currently its fixed). Thank you for working on this and lets stay connected on discord for this issue.

shashi-sah2003 commented 2 days ago

hey @JyotinderSingh @apoorvyadav1111 reducing the number of filters will lead to increase in false positive rate which we dont want right? but I wonder why redis uses only one filter?

JyotinderSingh commented 13 minutes ago

hey @JyotinderSingh @apoorvyadav1111 reducing the number of filters will lead to increase in false positive rate which we dont want right? but I wonder why redis uses only one filter?

We would need to investigate why they went with it. I'm not aware of the reasoning for that.