Cambricon / mlu-exporter

Apache License 2.0
24 stars 9 forks source link

Curl http://localhost:30108/metrics many times may result in panic #1

Closed berkaroad closed 1 year ago

berkaroad commented 2 years ago

1. Issue or feature description

panic: sync: negative WaitGroup counter

goroutine 86 [running]: sync.(WaitGroup).Add(0xc000514220, 0xffffffffffffffff) sync/waitgroup.go:74 +0x139 sync.(WaitGroup).Done(...) sync/waitgroup.go:99 github.com/Cambricon/mlu-exporter/pkg/collector.(Collectors).Collect.func1(0xc0004fa0c0, 0xc000270fc0, 0xc000514220, 0xc7f280, 0xc000270f90) github.com/Cambricon/mlu-exporter@/pkg/collector/collector.go:100 +0x61 created by github.com/Cambricon/mlu-exporter/pkg/collector.(Collectors).Collect github.com/Cambricon/mlu-exporter@/pkg/collector/collector.go:98 +0x177

2. Steps to reproduce the issue

1) build image

./build_image.sh

2) run container

docker run -d \ -p 30108:30108 \ --privileged=true \ cambricon-mlu-exporter:v1.5.3

3) open browser to request http://localhost:30108/metrics 3 times

my environment:

1) linux x86_64, MLU270

2) card: MLU270-X5K

3) cnmon: CNMON 1.20.1

berkaroad commented 2 years ago

when I add log to pkg/collector/collector.go, the trouble disappear.

func (c *Collectors) Collect(ch chan<- prometheus.Metric) {
        c.mutex.Lock()
        defer c.mutex.Unlock()
        c.collectSharedInfo()

        wg := sync.WaitGroup{}
        wg.Add(len(c.collectors))
        log.Printf("c.collectors's len = %s, when add to wg", len(c.collectors)) // I add
        i := 0
        for _, col := range c.collectors {
                go func(col Collector, index int) {
                        col.collect(ch, c.sharedInfo)
                        log.Printf("col index is %d", index)  // I add
                        wg.Done()
                }(col, i)
                i++
        }
        wg.Wait()
}

then I restore the code, the trouble still occur.

berkaroad commented 2 years ago

image image

I print waitGroup var, didn't find any reason

YuxiJin-tobeyjin commented 1 year ago

@berkaroad, We did some refactoring, please try our latest version https://github.com/Cambricon/mlu-exporter/releases/tag/v1.6.7

YuxiJin-tobeyjin commented 1 year ago

We can't reproduce on the new version, welcome to try on the new version and submit an issue @berkaroad