IntersectMBO / lsm-tree

A Haskell library for on-disk tables based on LSM-trees
Apache License 2.0
24 stars 7 forks source link

Implement Monkey-style allocation of bits to bloom filters #285

Closed jorisdral closed 1 month ago

jorisdral commented 1 month ago

In the Optimal Bloom FIlters and Adapative Merging for LSM-Trees paper, section 4, the authors observe that the lookup cost for an LSM-Tree is proportional to the sum of false positive rates of all the bloom filters. The authors provide equations/algorithms to determine the optimal FPR for each level, which we can use to initialise bloom filters.

We will implement this "Monkey"-style allocation of bits to bloom filters. Note that:

In general, I think the algorithm should be parameterised with: (i) the total memory budget for bloom filters, and (ii) the current levels. It's not clear to me that the paper provides an algorithm of this shape (maybe appendix B?), but you'd probably be able to rearrange the existing equations/algorithms to fit our needs.

Previous discussion, might be interesting to look at: https://github.com/IntersectMBO/lsm-tree/pull/210#discussion_r1598660727. Not everything I've mentioned there is relevant or correct, but it shows a dump of thoughts we had before on the topic.