-
The [wiki page](https://en.wikipedia.org/wiki/Count%E2%80%93min_sketch) for CMS describes a variant with a different query function called count-mean-min-sketch; this tries to reduce bias in the estim…
-
Steps:
1) DONE use the same TEvStatisticsRequest / TEvStatisticsResponse interface for datashard / columnshard
https://github.com/ydb-platform/ydb/pull/5820
2) implement Count-Min Sketch with al…
-
We can reduce the size our the CMS to 1/4 if we use the logic behind https://github.com/seiflotfy/count-min-log
-
When adding two CountMinSketch objects, is it intentional that the heavy hitters are duplicated when they're summed using the monoid? The `heavyHitters` method on CMS is correct because it filters out…
-
The width of the sketch [according to the paper](http://www.cse.unsw.edu.au/~cs9314/07s1/lectures/Lin_CS9314_References/cm-latin.pdf) should be set to `ceil(e/epsilon)` where e is Euler's number. How…
-
Right now we have the option to `show column totals`. This applies to _all columns_, even though it's not really applicable to everything. (eg. if my column is full of `ids`, the total is useless and …
-
Wrap the low-level implementations (min count sketch, hash tables for key iteration...) in a Counter interface. The goal is to make Counter Bounter switch trivial for users.
Methods that cannot be…
-
Aside from being reusable and useful on its own, it is specifically going to be useful for some upcoming algorithms, such as count-min sketch-based sparse approx nearest neighbors and for the approxim…
-
When the optimizer calculates the selectivity of an equality filter (e.g. `a = 1`) with a value that is contained in the range of a histogram bucket, it assumes that the values in the histogram bucket…
-
### Presto aggregate function: APPROX_HEAVY_HITTERS(A, min_percent_share, ε, δ) -> MAP(K, V)
A= column of the table. In other words, entire array of values.
n= total number of values(rows) in A
min…