FastFilter / fastfilter_java

Fast Approximate Membership Filters (Java)
Apache License 2.0
238 stars 27 forks source link

fix array OOBE in blocked bloom filter when top 4 bits of hash are se… #14

Closed richardstartin closed 4 years ago

richardstartin commented 4 years ago

…t (seed dependent behaviour)

richardstartin commented 4 years ago

In the blocked bloom filters was it intended that the same key would never be mapped to the same word (they now do whenever the upper 4 bits of the hash are zero), hence the plus 1? Should the solution be to increase (double?) the size of the bitset instead?

richardstartin commented 4 years ago

@thomasmueller I sensed this wasn't the correct fix for the blocked bloom filters so will address that.

richardstartin commented 4 years ago

@lemire do you have any time to look at this?

lemire commented 4 years ago

@richardstartin See my proposal. I think that only one change you are making is somewhat controversial in that it might be hiding a real bug instead of exposing it. I recommend you omit this change and then I think that the whole PR could be safely merged without any controversy.

thomasmueller commented 4 years ago

I would love to give you more feedback, but I'm afraid I have very little time now... Next year should be better.

richardstartin commented 4 years ago

I will leave the branch like this - there are several false negative bugs which aren't trivial to fix, and I've reverted the change to XorSimple. SimpleFuzzer will stop crashing on XorSimple when/if it gets fixed.

lemire commented 4 years ago

@thomasmueller

I recommend merging this PR. Do you agree?

thomasmueller commented 4 years ago

I think it would break the build, see https://travis-ci.com/FastFilter/fastfilter_java/builds/142464328

What about disabling the tests for now, so the build passes, and create issues for the problems? That way I think merging is fine.

richardstartin commented 4 years ago

You shouldn't merge the PR, I've left it to break to highlight the existence of various bugs which need fixing or otherwise removing.

lemire commented 4 years ago

Ok.

My view was that it does not really break anything, it merely exposes existing problems and thus is safe to merge. But I see that Richard has opened distinct bug issues.

richardstartin commented 4 years ago

@lemire I don't think there is a rush to merge it. I was keen to help get the project to the point where it could be released and used, but there are several nondeterministic bugs I don't have enough context to be able to fix, so my goal may have been a little premature. The fixes which are made in this PR are trivial, and merging the branch will just break master. It's worth leaving around for now to verify that the harder fixes have been made.

lemire commented 4 years ago

Ok.

thomasmueller commented 4 years ago

I merged the patch, as it improves the code a lot. I will try to resolve the remaining issues.

One option is to move the buggy implementations (specially MPHF, GCS2) to the "test" source folder, until they are fixed (kind of like "staging").

richardstartin commented 4 years ago

@thomasmueller I think moving them to test makes sense, whilst there are completely new filter implementations, the availability of some filters for evaluation purposes only could be confusing for users.

lemire commented 4 years ago

+1