axiomhq / rust-cuckoofilter

Cuckoo Filter: Practically Better Than Bloom (In Rust)
MIT License
271 stars 38 forks source link

Implement import/export features #29

Closed Libbum closed 6 years ago

Libbum commented 6 years ago

Closes #28 by introducing export, which collects all fingerprint data into a simple Vec<u8> for easy storage into a file or database.

Using such data, along with the length of the filter (from len) we can later reconstruct the filter by importing the stored data through recover.

(Some minor changes also introduced via rust-fmt)

florianjacob commented 6 years ago

I don't see a reason why this functionality shouldn't be part of cuckoofilter, so I'll take this over if @seiflotfy doesn't veto. 😉

florianjacob commented 6 years ago

Somewhat off topic: Oration looks quite interestering! Could you explain how & for what you (want to) use cuckoofilters there? From reading the linked issue, somehow for likes and dislikes?

Libbum commented 6 years ago

Excellent points @florianjacob, all of which shouldn't be an issue to implement at all. May not get to it until after Christmas/New Year though.


As for Oration: it's a commenting system, which allows users to up/downvote comments. On the frontend, a user votes - sending a post request to the backend. That request identifies the sender's IP and also sends a hash of user details. At the moment, I'm using a bloom filter to store IP details as a way to rate limit voting continuously. The issue with that is I can only really tell that a user has voted, not which way. Since you can't really delete values in a bloom filter, this isn't quite what I'd like. So having two cuckoo filters: one for upvotes, one for downvotes which could add an delete user details as needed.

That's what I hope to do at least, I haven't worked with these probabilistic data structures before - they look very impressive though.

florianjacob commented 6 years ago

May not get to it until after Christmas/New Year though.

Take your time! 🎄

Also, I wasn't sure about design choices of the crate - sometimes additional dependencies are not that welcomed without lengthy discussions

Not sure what design choices @seiflotfy has in mind, but I'd say serialization is quite an essential feature for a crate that provides a data structure, and serde is the generally accepted way to go. If anybody really has a problem with the dependency on serde for whatever reasons, we can still optionally hide it with a feature gate, but I guess most users will either find it usefull or don't care - in contrast to my C interface project, which would pull in cbindgen for something most users won't need, so I'm trying to hide it by default.


Oration has an interesting use case, so you're essentially saving database memory by not storing a full hashset of all user identification characteristics, but only a bloom / cuckoofilter, at the cost of having some false positives where users are denied to vote but don't have. Sadly one of the cases where one would probably prefer false negatives over false positives. But in return, dropping old values if the filter gets too small isn't that of an issue in your case.

seiflotfy commented 6 years ago

CAn't wait for this to land. great job

Libbum commented 6 years ago

Thanks! I plan to implement Florian's suggestions some time this week.

Libbum commented 6 years ago

Hi guys, sorry for the delay - life got in the way.

I've just adjusted this based on Florian's comments.

I've used serde_derrive to generate the serialisation on the ExportCuckooFilter since that's the most straightforward method. If you want to drop that dependency I can implement those directly fairly easily.

Talanor commented 6 years ago

Hello, any update on this PR?

Libbum commented 6 years ago

It should be ready, just waiting for a second round review.

seiflotfy commented 6 years ago

Will take a look

On Mon, Mar 26, 2018, 11:26 AM Tim DuBois notifications@github.com wrote:

It should be ready, just waiting for a second round review.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/seiflotfy/rust-cuckoofilter/pull/29#issuecomment-376103031, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHMTWw9wihRUDmsXel_O09k5OVEkZEbks5tiLRTgaJpZM4RLaQY .

florianjacob commented 6 years ago

This now has a Changelog entry and seems sufficiently documented. :+1:

seiflotfy commented 6 years ago

I think this should be part of 0.4.0 WDYT @florianjacob

florianjacob commented 6 years ago

@seiflotfy I agree. If you don't want something to be changed here, too, I'll merge this and my #27 and release master as 0.4.0 tomorrow morning (in ~24 hours).

seiflotfy commented 6 years ago

go ahead :D