Closed filipecosta90 closed 2 years ago
Merging #48 (8dc0092) into master (7a2c58a) will increase coverage by
0.50%
. The diff coverage is100.00%
.
@@ Coverage Diff @@
## master #48 +/- ##
==========================================
+ Coverage 76.91% 77.41% +0.50%
==========================================
Files 6 6
Lines 693 704 +11
==========================================
+ Hits 533 545 +12
+ Misses 95 94 -1
Partials 65 65
Impacted Files | Coverage Δ | |
---|---|---|
hdr.go | 90.63% <100.00%> (+0.70%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 7a2c58a...8dc0092. Read the comment docs.
Summary of optimizations
Detail of analysis/changes
Looking at the baseline CPU time by function in the following manner:
We can observe that the iterator
nextCountAtIdx()
is the responsible for the majority of the CPU time. ( Even after improving the percentile calculation on the latest release as showcased in #46 ).Doing the same analysis by line of code as follow:
We can observe that the top consuming LOC are:
condition on hdr.go#L626 taking ~11% of cpu-time:
if i.countToIdx >= i.h.totalCount
. Notice that we're doing a more restrictive check at hdr.go#L340if total >= countAtPercentile {
, meaning we can completely avoid this condition check.condition on hdr.go#L631 taking ~11% of cpu-time:
if i.subBucketIdx >= i.h.subBucketCount {
condition on hdr.go#L636 taking ~7% of cpu-time:
if i.bucketIdx >= i.h.bucketCount {
. Given at max ( percentile 100 ) we will be at the limit of bucketCount we can completely avoid this duplicate check.return of
getCountAtIdx
on hdr.go#L643 taking ~19% of cpu-time:return true
. Notice that the function is not inlined. We can move away from up to O(N+M) calls togetCountAtIdx
to O(1) call of the new optimized method that we've introduced namedgetValueFromIdxUpToCount
.Looking further at hotspots we can also check that
getCountAtIndex
is also a good candidate for optimization (on hdr.go#L640 takes 13% CPU time).Even though we can't remove this call, we can reduce the amount of duplicate computation within it -- specifically on the inner calls to the calculation of
bucketBaseIdx
that don't change during the time we iterate on each bucket sub-buckets. With that in mind, we've introducedgetCountAtIndexGivenBucketBaseIdx
and only calculate thebucketBaseIdx
on the iteration that change bucket ( meaning no wasted computation on sub-bucket flows ).Impact of the above optimizations
Following up on all the we've moved from a baseline of:
to the new optimized ValueAtPercentile / ValueAtPercentileGivenPercentileSlice: