armon / bloomd

C network daemon for bloom filters
http://armon.github.io/bloomd
Other
1.24k stars 112 forks source link

Higher false positive rate than expected #35

Closed sezginriggs closed 8 years ago

sezginriggs commented 8 years ago

Hello,

After setting up and running bloomd without a problem, I see higher false positive rate than expected. I'm using bloom-python-driver and creating filter with initial_capacity=1000000, 0.0001 max false positive probability. Then I'm trying to insert and check same url set for several times. After adding almost 110000 URL to filter (in one loop), I'm running the script again, and this time it's adding almost 40.000 of them again.

While script running sometimes it returns "Client Error: Command not supported" error, I suppose it's because of some malformed etc. URLs and I just pass (in try..except) and continue to loop, can it be the reason of subsequent false positives?

When I check key with telnet connection, it returns "yes". What can be the reason for this problem? Do you have any ideas?

Thank you very much for your help in advance...

sezginriggs commented 8 years ago

If the URL includes newline (\n), bloomd or bloom-python-driver starts to return wrong values. I removed these URLs and the problem solved but maybe checking against this in bloomd can be helpful.

armon commented 8 years ago

@sezginriggs Its a bug in the client library sounds like, it should not allow keys with newlines to be sent to the server. From the servers perspective, the client is just violating the protocol.