Open mdogan opened 2 years ago
Hi there @mdogan , during the weekend I'll work on extending the tests to try to reach this state ( edge case ). I will keep you posted.
Just a safe check:
After upgrading to v1.1.2
The version you had before was v1.1.1 ?
Thanks @filipecosta90.
Actually no. It was v1.1.0 before v1.1.2.
we also met this panic. we use v1.1.2
panic: runtime error: index out of range [25600] with length 25600
goroutine 212 [running]:
github.com/HdrHistogram/hdrhistogram-go.(*Histogram).getCountAtIndexGivenBucketBaseIdx(...)
/root/gopath/pkg/mod/github.com/!hdr!histogram/hdrhistogram-go@v1.1.2/hdr.go:599
github.com/HdrHistogram/hdrhistogram-go.(*Histogram).getValueFromIdxUpToCount(0x1d?, 0xc08fbaa200?)
/root/gopath/pkg/mod/github.com/!hdr!histogram/hdrhistogram-go@v1.1.2/hdr.go:361 +0xb7
github.com/HdrHistogram/hdrhistogram-go.(*Histogram).ValueAtPercentile(0xc05b2fe200, 0x9?)
/root/gopath/pkg/mod/github.com/!hdr!histogram/hdrhistogram-go@v1.1.2/hdr.go:335 +0x65
@stefanv5 do you have an easy reproduction of the bug?
@stefanv5 do you have an easy reproduction of the bug? Thanks for your reply. In our case, we create 4 goroutines to store monitoring metrics, and every 10 seconds we calculate metrics' P99 values, then we reset the histogram, we ensure all procedures are atomic. Perhaps after 1h, we find panic occurred. Hope these infomation is helpful.
We have the same problem - likely related to getting 99.9999 centile. I think workaround is to use ValueAtPercentiles as it iterates all the buckets and looks to check range (if i.bucketIdx >= i.h.bucketCount) whereas getValueFromIdxUpToCount relies on maths to prevent range overflow and this is what looks to be broken.
After upgrading to
v1.1.2
, occasionally we are observing the following panic:Histogram is created with:
WindowedHistogram
is being rotated on every 10mins andHistogram.ValueAtQuantile()
is called on aHistogram
produced byWindowedHistogram.Merge()
.This panic happened twice in the last ten days. But I'm not able to reproduce it on a local environment. This application is running on production more than a year and we haven't seen this issue with earlier versions.