dialtr / libcount

C/C++ Implementation of the HyperLogLog++ cardinality estimation algorithm.
Apache License 2.0
26 stars 10 forks source link

Simplified interpolation method per astolman's suggestion. #15

Closed dialtr closed 7 years ago

dialtr commented 7 years ago

The previous interpolation method for finding bias values given a raw estimate was flawed. The new, simpler method does the following:

Given raw_estimate, find the adjacent values in the corresponding estimate table that straddle it, and then use linear interpolation to calculate the bias based on the bias values located at the corresponding indices in the bias tables.

If the raw_estimate is out of range of the estimates in the table, the lowest/highest bias value is used if the raw estimate is too small/large respectively.

Other notes related to this checkin: Renamed EMP_xxx functions to EmpiricalXxx to comply with the coding standard. Removed the nearest neighbor code and tests since it is defunct. Updated the README to mention the SSL package needed on Ubuntu to make examples.