apache / datasketches-java

A software library of stochastic streaming algorithms, a.k.a. sketches.
https://datasketches.apache.org
Apache License 2.0
898 stars 209 forks source link

Optimise HLL union for mergeHLLtoHLL #572

Closed freakyzoidberg closed 5 months ago

freakyzoidberg commented 5 months ago

This PR introduce JMH benchmark - happy to remove it and leave that to characterisation if thats an issue.

This also only shows in a particular place that some optimisation can be gained by avoiding a call to Math.max + array mutation with a simple comparaison.

About 8% gain for HLL heap union

Before

Benchmark                                       Mode  Cnt    Score    Error  Units
BenchmarkUnion.benchHllUnionHLLToHLLMerge  avgt    5  511.411 ± 12.468  ms/op

After

Benchmark                                       Mode  Cnt    Score   Error  Units
BenchmarkUnion.benchHllUnionHLLToHLLMerge  avgt    5  471.329 ± 9.712  ms/op
AlexanderSaydakov commented 5 months ago

I vaguely remember when I worked on similar improvements I found that max is faster than comparison. My understanding is that branch prediction misses are expensive, and max can be implemented with no branches.