@fanyang01
Thank you for this contribution. Because there were a lot of intervening work on the quantiles sketches since you made this request, there were a lot of conflicts that were too messy to try to fix from this PR. So I ended up pulling your files from your repo myself so I could look at them in my IDE.
I did implement much of what you proposed but with a number of modifications to improve performance. I also moved much of the "vector" code together to make it easier to read and debug.
Nonetheless, I was impressed that you examined the KLL code very deeply and clearly understood where to make the changes.
To run this, you will need to run Job as a Java application where you provide a string of the fully qualified name of the above config file to the main method and the dependencies must be pointing to the current datasketches-java master branch (at least until it is released, then you could point to a jar). This is trivial to do in Eclipse, but from the command line much harder.
Nonetheless, this chart shows the improvement of vector input vs non-vector input. That is a 28% improvement at the high-end. Not bad!
Vector updates for KllDoublesSketch has now been checked into master with PR #539 and will become available on the next release. This also closes Issue #492 .
I look forward to other contributions you might consider.
@fanyang01 Thank you for this contribution. Because there were a lot of intervening work on the quantiles sketches since you made this request, there were a lot of conflicts that were too messy to try to fix from this PR. So I ended up pulling your files from your repo myself so I could look at them in my IDE.
I did implement much of what you proposed but with a number of modifications to improve performance. I also moved much of the "vector" code together to make it easier to read and debug.
Nonetheless, I was impressed that you examined the KLL code very deeply and clearly understood where to make the changes.
I ran some characterization tests:
Nonetheless, this chart shows the improvement of vector input vs non-vector input. That is a 28% improvement at the high-end. Not bad!
Vector updates for KllDoublesSketch has now been checked into master with PR #539 and will become available on the next release. This also closes Issue #492 .
I look forward to other contributions you might consider.