alecmocatta / streaming_algorithms

Performant implementations of various streaming algorithms, including Count–min sketch, Top k, HyperLogLog, Reservoir sampling.
Apache License 2.0
83 stars 11 forks source link

Added mem-dbg as optional feature #19

Open LucaCappelletti94 opened 1 month ago

LucaCappelletti94 commented 1 month ago

Mem-dbg is a crate that allows to compute the size of a struct. I have added the derives through the crate as an optional feature, so as to use it to compare this implementation with others easily.

Cheers!

LucaCappelletti94 commented 1 month ago

FYI, the errors are caused by the CI configuration being old. Consider updating it.

alecmocatta commented 1 month ago

Thanks for this! Let me know when done and will merge despite the outdated CI.

LucaCappelletti94 commented 1 month ago

Hi @alecmocatta, sure - I am running some benchmarks so I may add some other small edits.

LucaCappelletti94 commented 1 month ago

Why are you checking whether the estimates are three-sorted? I see that you run a binary search but that requires the estimates to be completely sorted, and I recently discovered that the estimates provided from the HLL++ paper are not sorted (they nearly are). I will be providing a fix for that shortly.

https://github.com/alecmocatta/streaming_algorithms/blob/99522db25ab4f7a7ba91c793b7568cc1c62afa56/src/distinct/consts.rs#L33-L45