c-blake / adix

An Adaptive Index Library for Nim
ISC License
38 stars 2 forks source link

BLTab saturated #10

Closed blackmius closed 2 months ago

blackmius commented 2 months ago

what right behaviour on getting this error?

reproduces with such scenario

import pkg/adix/[bltab, althash]
const mask = 1 shl 32 - 1
var a = initBLTab(1000, mask)
for i in 1..1000:
  discard a.containsOrIncl(hashNASAM(i) and mask)
c-blake commented 2 months ago

Well, so what's going on is that the usual case for these probabilistic false-positive but no false negative data structures is to fill them up & saturate them. As you saturate a Bloom filter by overfilling it, it degrades more gracefully just always returning "present", but this variant doesn't do that - instead it runs out of slots.

So, you have a couple of options - you could make that 1000 bigger, like 1500 say, or you could put a try-except around some operations. I might ask what you are actually doing to understand if this is even an appropriate choice. Also, there is a test and some related discussion in comments in tests/bl.nim.

c-blake commented 2 months ago

Also, I am closing this issue as I believe it's more|less an expected behavior. But feel free to continue to ask questions here!

c-blake commented 2 months ago

I should also say that there are a lot of variables in the module copied over from adix/lptabz.nim that are more TODO/future work with a possibly "auto-resizing BLTab". I never really got around to that. In the area this module plays in, the competition is highly focused around "bounded memory". So, at a minimum it needs callers to specify new API things like "false positive probability ranges" that control such auto-resize. And, really, some query calls like what are the false+ probs might also make sense. In a lot of ways adix/bltab is still at the proof of concept stage. I am very happy to entertain very seriously any PRs along those improved usability lines, though. (And I remain curious about your application!)

blackmius commented 2 months ago

i need sort of bloom filter for int values, i have an array of ids and need quickly understand if some id there or not. I found a forum thread where you discussed about bloom filters and implementation https://forum.nim-lang.org/t/10232

and i give a try to bltab. i have a peek in the sources and think it is not quite what i need (i am also wishing use as small as possible space for storing bits but if i understand bltabs uses as much as many bytes as it has)

c-blake commented 2 months ago

Well, Bloom filters will also use as many bytes as it was set up with, but erase/distribute the keys more thoroughly than bltab.

So, if you are willing to start a little more wasteful in sizes of the small integers stored (at the benefit of extremely low initial false+ probability), but not table slots, then you really might be best off adding some kind of auto-resize policy to bltab.nim. Bloom doesn't really give you that option as everything is bit-wise.