dialtr / libcount

C/C++ Implementation of the HyperLogLog++ cardinality estimation algorithm.
Apache License 2.0
26 stars 10 forks source link

Address various issues with the empirical bias correction. #10

Closed dialtr closed 8 years ago

dialtr commented 8 years ago

Upon inspection, I learned that the raw estimate data in the empirical bias correction data set did not increase monotonically, an assumption I made in the old implementation of EMP_bias(). After confirming with Marc Nunkesser, I rewrote EMP_bias() to select the 2 nearest neighbors for interpolation. While still using linear interpolation, I did write a general function for finding k nearest neighbors, which will facilitate using kNN regression to get the bias value in the future if that is desirable.