HdrHistogram / hdrhistogram-go

A pure Go implementation of Gil Tene's HDR Histogram.
MIT License
429 stars 64 forks source link

panic: runtime error: index out of range #49

Open mdogan opened 2 years ago

mdogan commented 2 years ago

After upgrading to v1.1.2, occasionally we are observing the following panic:

panic: runtime error: index out of range [25600] with length 25600
github.com/HdrHistogram/hdrhistogram-go.(*Histogram).getCountAtIndexGivenBucketBaseIdx(...)
    /go/pkg/mod/github.com/!hdr!histogram/hdrhistogram-go@v1.1.2/hdr.go:599
github.com/HdrHistogram/hdrhistogram-go.(*Histogram).getValueFromIdxUpToCount(0xb7, 0x3ff800000)
    /go/pkg/mod/github.com/!hdr!histogram/hdrhistogram-go@v1.1.2/hdr.go:361 +0xb7
github.com/HdrHistogram/hdrhistogram-go.(*Histogram).ValueAtPercentile(0xc0000b6480, 0x0)
    /go/pkg/mod/github.com/!hdr!histogram/hdrhistogram-go@v1.1.2/hdr.go:335 +0x65
github.com/HdrHistogram/hdrhistogram-go.(*Histogram).ValueAtQuantile(...)
    /go/pkg/mod/github.com/!hdr!histogram/hdrhistogram-go@v1.1.2/hdr.go:319

Histogram is created with:

hdrhistogram.NewWindowed(windowCount, 1, maxLatency.Nanoseconds(), 3)

WindowedHistogram is being rotated on every 10mins and Histogram.ValueAtQuantile() is called on a Histogram produced by WindowedHistogram.Merge().

This panic happened twice in the last ten days. But I'm not able to reproduce it on a local environment. This application is running on production more than a year and we haven't seen this issue with earlier versions.

filipecosta90 commented 2 years ago

Hi there @mdogan , during the weekend I'll work on extending the tests to try to reach this state ( edge case ). I will keep you posted.

Just a safe check:

After upgrading to v1.1.2

The version you had before was v1.1.1 ?

mdogan commented 2 years ago

Thanks @filipecosta90.

Actually no. It was v1.1.0 before v1.1.2.

stefanv5 commented 10 months ago

we also met this panic. we use v1.1.2

panic: runtime error: index out of range [25600] with length 25600
goroutine 212 [running]:
github.com/HdrHistogram/hdrhistogram-go.(*Histogram).getCountAtIndexGivenBucketBaseIdx(...)
        /root/gopath/pkg/mod/github.com/!hdr!histogram/hdrhistogram-go@v1.1.2/hdr.go:599
github.com/HdrHistogram/hdrhistogram-go.(*Histogram).getValueFromIdxUpToCount(0x1d?, 0xc08fbaa200?)
        /root/gopath/pkg/mod/github.com/!hdr!histogram/hdrhistogram-go@v1.1.2/hdr.go:361 +0xb7
github.com/HdrHistogram/hdrhistogram-go.(*Histogram).ValueAtPercentile(0xc05b2fe200, 0x9?)
        /root/gopath/pkg/mod/github.com/!hdr!histogram/hdrhistogram-go@v1.1.2/hdr.go:335 +0x65
filipecosta90 commented 10 months ago

@stefanv5 do you have an easy reproduction of the bug?

stefanv5 commented 9 months ago

@stefanv5 do you have an easy reproduction of the bug? Thanks for your reply. In our case, we create 4 goroutines to store monitoring metrics, and every 10 seconds we calculate metrics' P99 values, then we reset the histogram, we ensure all procedures are atomic. Perhaps after 1h, we find panic occurred. Hope these infomation is helpful.

ItsLifeJim commented 9 months ago

We have the same problem - likely related to getting 99.9999 centile. I think workaround is to use ValueAtPercentiles as it iterates all the buckets and looks to check range (if i.bucketIdx >= i.h.bucketCount) whereas getValueFromIdxUpToCount relies on maths to prevent range overflow and this is what looks to be broken.