gakhov / pdsa

Probabilistic Data Structures and Algorithms in Python
http://pdsa.readthedocs.io
MIT License
121 stars 19 forks source link

Some problems about Count Sketch #29

Open wenwang00 opened 1 year ago

wenwang00 commented 1 year ago

Dear Andrii, I had a puzzle when I learn Count Sketch, for those small frequency data, if they collide with the high frequency data, and the hash is exactly 1 and -1, then the frequency estimate of the low frequency data will give a negative estimate, I want to ask how to deal with this situation, I read a lot of explanations did not mention this, thank you

gakhov commented 1 year ago

Sorry, didn't get the exact point where you get a negative estimate due to the collision to 1/-1. Maybe you can refer to it in the book?

However, since the frequency by definition cannot be negative, the most obvious way is to compute max(0, f_estimate) as the result.