bits-and-blooms / bloom

Go package implementing Bloom filters, used by Milvus and Beego.
BSD 2-Clause "Simplified" License
2.44k stars 232 forks source link

Question for hashing strings and order inside it #64

Closed robsztal closed 2 years ago

robsztal commented 3 years ago

Hey,

I have a question if it is possible to hash strings not depending on the order of strings? I mean

hash("black cat") = hash("cat black")

this generates false negatives because changing order of sentence results in totally different hash. Is there any option for it? Maybe splitting sentences and combining hashed strings?

Thank you for any response.

robsztal commented 3 years ago

If it is possible I would like to implement that functionality as AddStringsNotInOrder function that uses splitting and summing hashes

willf commented 3 years ago

I'm not the maintainer any more, but I suspect this is beyond the scope of this library, and it would be better if the calling program would do the collation before the hash call.

lemire commented 2 years ago

@willf is correct, it is outside the scope of the library to handle such an issue.

You can hash each word separately, and define the total hash function as a sum (for example):

totalhash("black cat") = hash("black") + hash("cat")

It is an interesting issue, but unrelated to bloom filters... it is a generic challenge.