Closed Libbum closed 6 years ago
I don't see a reason why this functionality shouldn't be part of cuckoofilter, so I'll take this over if @seiflotfy doesn't veto. 😉
Somewhat off topic: Oration looks quite interestering! Could you explain how & for what you (want to) use cuckoofilters there? From reading the linked issue, somehow for likes and dislikes?
Excellent points @florianjacob, all of which shouldn't be an issue to implement at all. May not get to it until after Christmas/New Year though.
repeat(Bucket::new()).take(capacity)
was in turn, something I hadn't seen before: very cool!As for Oration: it's a commenting system, which allows users to up/downvote comments. On the frontend, a user votes - sending a post request to the backend. That request identifies the sender's IP and also sends a hash of user details. At the moment, I'm using a bloom filter to store IP details as a way to rate limit voting continuously. The issue with that is I can only really tell that a user has voted, not which way. Since you can't really delete values in a bloom filter, this isn't quite what I'd like. So having two cuckoo filters: one for upvotes, one for downvotes which could add an delete user details as needed.
That's what I hope to do at least, I haven't worked with these probabilistic data structures before - they look very impressive though.
May not get to it until after Christmas/New Year though.
Take your time! 🎄
Also, I wasn't sure about design choices of the crate - sometimes additional dependencies are not that welcomed without lengthy discussions
Not sure what design choices @seiflotfy has in mind, but I'd say serialization is quite an essential feature for a crate that provides a data structure, and serde is the generally accepted way to go. If anybody really has a problem with the dependency on serde for whatever reasons, we can still optionally hide it with a feature gate, but I guess most users will either find it usefull or don't care - in contrast to my C interface project, which would pull in cbindgen for something most users won't need, so I'm trying to hide it by default.
Oration has an interesting use case, so you're essentially saving database memory by not storing a full hashset of all user identification characteristics, but only a bloom / cuckoofilter, at the cost of having some false positives where users are denied to vote but don't have. Sadly one of the cases where one would probably prefer false negatives over false positives. But in return, dropping old values if the filter gets too small isn't that of an issue in your case.
CAn't wait for this to land. great job
Thanks! I plan to implement Florian's suggestions some time this week.
Hi guys, sorry for the delay - life got in the way.
I've just adjusted this based on Florian's comments.
ExportedCuckooFilter
, which derives serialisation traits via serde.ExportedCuckooFilter
, CukooFilter
and Bucket
. No need to do this for Fingerprint
as the slice copy needs to be done by reference. We'd be creating extra Fingerprint::empty()
instances if we did this.recover
functions and privatise methods that no longer need to be public.I've used serde_derrive
to generate the serialisation on the ExportCuckooFilter
since that's the most straightforward method. If you want to drop that dependency I can implement those directly fairly easily.
Hello, any update on this PR?
It should be ready, just waiting for a second round review.
Will take a look
On Mon, Mar 26, 2018, 11:26 AM Tim DuBois notifications@github.com wrote:
It should be ready, just waiting for a second round review.
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/seiflotfy/rust-cuckoofilter/pull/29#issuecomment-376103031, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHMTWw9wihRUDmsXel_O09k5OVEkZEbks5tiLRTgaJpZM4RLaQY .
This now has a Changelog entry and seems sufficiently documented. :+1:
I think this should be part of 0.4.0 WDYT @florianjacob
@seiflotfy I agree. If you don't want something to be changed here, too, I'll merge this and my #27 and release master as 0.4.0 tomorrow morning (in ~24 hours).
go ahead :D
Closes #28 by introducing
export
, which collects all fingerprint data into a simpleVec<u8>
for easy storage into a file or database.Using such data, along with the length of the filter (from
len
) we can later reconstruct the filter by importing the stored data throughrecover
.(Some minor changes also introduced via rust-fmt)