apache / datasketches-java

A software library of stochastic streaming algorithms, a.k.a. sketches.
https://datasketches.apache.org
Apache License 2.0
893 stars 209 forks source link

feat: add vectorized update API to KllDoublesSketch #496

Closed fanyang01 closed 6 months ago

leerho commented 6 months ago

@fanyang01 Thank you for this contribution. Because there were a lot of intervening work on the quantiles sketches since you made this request, there were a lot of conflicts that were too messy to try to fix from this PR. So I ended up pulling your files from your repo myself so I could look at them in my IDE.

I did implement much of what you proposed but with a number of modifications to improve performance. I also moved much of the "vector" code together to make it easier to read and debug.

Nonetheless, I was impressed that you examined the KLL code very deeply and clearly understood where to make the changes.

I ran some characterization tests:

Nonetheless, this chart shows the improvement of vector input vs non-vector input. That is a 28% improvement at the high-end. Not bad!

KllHeapDoublesVectorVsNonVectorSpeed_28Mar2024

Vector updates for KllDoublesSketch has now been checked into master with PR #539 and will become available on the next release. This also closes Issue #492 .

I look forward to other contributions you might consider.

leerho commented 6 months ago

This PR is now closed as it has been superseded by PR #539.