Open alexandrnikitin opened 8 years ago
Here are some alternative JVM based implementations as inspirations ;)
Looking at these, seems like we could use the same approach as example A and use UnsafeBitArray
instead of byte[]
@kutchar Thanks for the info. BTW, @alexander-chervony is a colleague of mine and yes, they are all essentially ports of one lib. The problem with all of them is that there's no math from the paper. They all expect calculated values. One more crucial thing is that they use int
to refresh an element which is huge waste of memory. While the recommendation is to pick the Max
value as small as possible. "In practice, we limit our choice of Max to 1, 3 and 7 (if higher time cost can be tolerated, larger values of Max can be tried similarly)." Which is just few bits not 4 bytes.
I pushed some code ripped out from our private repo. https://github.com/alexandrnikitin/bloom-filter-scala/commit/867596602870d562061e936340e61da88c8da857 I really feel ashamed for doing that but I don't want to protract anymore. It's Java, not polished and not finished yet. The math is implemented in straightforward way from the original paper and very slooow, it needs some math love. I put an excerpt from the paper in the comments. The only way to use it at the moment is to adjust parameters by hand for your input.
Stable Bloom filter