Open homiak opened 5 years ago
The whole magic of this algorithm is that instead of calculating exact percentiles it adds an error (called epsilon) to allow more efficient recording of the summaries. The default in this library is 0.01
, so a requested percentile of 0.01
is actually somewhere between zero and 0.02
. You can request an epsilon of zero but I suspect the cost for that would be too high eventually.
Here are the percentile calculated using this library (first column) and percentiles calculated using github.com/montanaflynn/stats library (second column)
Target percentiles:
calcQuantiles = []float64{0.001, 0.01, 0.03, 0.05, 0.1, 0.2, 0.3, 0.4, 0.50, 0.90}
Method used to log the above
Please note a big difference in low percentile numbers between the 3 print outs above. Even though one would expect some differences at say 0.1th percentile because 3 print outs above contained different sample numbers. Still there is unexpectedly large variability with only a limited sample number change. Also notice that percentile calculated with
github.com/montanaflynn/stats
does not display the same variability, which rules out the possibility that the samples added were somehow very different to the previous lot.So overall I think this library calculates low percentiles incorrectly or the method is not really suitable for this.
I have the data that was used for calculating the above and happy to share it. Just let me know if you want it. (it's less than 1Mb file)