addthis / stream-lib

Stream summarizer and cardinality estimator.
Apache License 2.0
2.26k stars 556 forks source link

Bugfixes to CountMinSketchTest. #72

Closed mspiegel closed 10 years ago

mspiegel commented 10 years ago

In response to https://github.com/addthis/stream-lib/issues/59

This fixes the CountMinSketch string key test. Previously the test was attempting to mimic the behavior of the integer key test. The set of a Strings of size |N| is much larger then the set of all integers from 0 to N so a different strategy is necessary. Generate some strings to insert into the sketch and then generate some test strings that are likely not in the sketch.

Bugfix for the tolerance calculations. The guarantee is that (estimate <= actual + epison * cardinality) with some probability, or rewriting the terms (estimate - actual) / cardinality <= epsilon.