go-bond / bond

Other
10 stars 3 forks source link

Review bloom filter code #67

Closed marino39 closed 1 year ago

marino39 commented 1 year ago

Please review bloom filter code, and consider moving the app-level filters from the sequence indexer down to go-bond.

At the moment we are using our own bloom filters on top of the pebble one. This poses a number of problems such us:

  1. How to keep bloom filter in sync with the database after process restart?

    The performance degrades too much if we try to persist it with each pebble data batch. We settled on persisting filter on process close for now. However it may fail because of process force close and then filter need to be regenerated which takes ~25 mins.

  2. How to split filter efficiently, so we can persist it in smaller chunks?

    At the moment the problem is that all chunks are touched even if we decide to have 1 mln chunks. So on each batch, we would need to persist 1-2 GB of data.

It would be perfect if we could finetune built-in bloom filters in pebble to fit out purposes. Unfortunately, pebble bloom filter do not give us enough performance boost in current configuration.