gakhov / pdsa

Probabilistic Data Structures and Algorithms in Python
http://pdsa.readthedocs.io
MIT License
121 stars 19 forks source link

Implement frequency algorithms such as Count Sketch and Count-Min Sketch #15

Closed gakhov closed 5 years ago

gakhov commented 5 years ago

Count Sketch and Count–Min Sketch are simple space-efficient probabilistic data structures that are used to estimate frequencies of elements in data streams and can address the Heavy hitters problem.

Count Sketch was proposed by Moses Charikar, Kevin Chen, and Martin Farach-Colton in 2002. Count–Min Sketch was presented in 2003 by Graham Cormode and Shan Muthukrishnan and published in 2005.

In the current implementation, we support up to 2^{32} -1 counters (due to 32-bit hash functions) each of 32 bits.