Closed blackmius closed 4 months ago
Well, so what's going on is that the usual case for these probabilistic false-positive but no false negative data structures is to fill them up & saturate them. As you saturate a Bloom filter by overfilling it, it degrades more gracefully just always returning "present", but this variant doesn't do that - instead it runs out of slots.
So, you have a couple of options - you could make that 1000 bigger, like 1500 say, or you could put a try-except
around some operations. I might ask what you are actually doing to understand if this is even an appropriate choice. Also, there is a test and some related discussion in comments in tests/bl.nim
.
Also, I am closing this issue as I believe it's more|less an expected behavior. But feel free to continue to ask questions here!
I should also say that there are a lot of variables in the module copied over from adix/lptabz.nim
that are more TODO/future work with a possibly "auto-resizing BLTab". I never really got around to that. In the area this module plays in, the competition is highly focused around "bounded memory". So, at a minimum it needs callers to specify new API things like "false positive probability ranges" that control such auto-resize. And, really, some query calls like what are the false+ probs might also make sense. In a lot of ways adix/bltab is still at the proof of concept stage. I am very happy to entertain very seriously any PRs along those improved usability lines, though. (And I remain curious about your application!)
i need sort of bloom filter for int values, i have an array of ids and need quickly understand if some id there or not. I found a forum thread where you discussed about bloom filters and implementation https://forum.nim-lang.org/t/10232
and i give a try to bltab. i have a peek in the sources and think it is not quite what i need (i am also wishing use as small as possible space for storing bits but if i understand bltabs uses as much as many bytes as it has)
Well, Bloom filters will also use as many bytes as it was set up with, but erase/distribute the keys more thoroughly than bltab.
So, if you are willing to start a little more wasteful in sizes of the small integers stored (at the benefit of extremely low initial false+ probability), but not table slots, then you really might be best off adding some kind of auto-resize policy to bltab.nim
. Bloom doesn't really give you that option as everything is bit-wise.
what right behaviour on getting this error?
reproduces with such scenario