Closed berkaroad closed 1 year ago
when I add log to pkg/collector/collector.go, the trouble disappear.
func (c *Collectors) Collect(ch chan<- prometheus.Metric) {
c.mutex.Lock()
defer c.mutex.Unlock()
c.collectSharedInfo()
wg := sync.WaitGroup{}
wg.Add(len(c.collectors))
log.Printf("c.collectors's len = %s, when add to wg", len(c.collectors)) // I add
i := 0
for _, col := range c.collectors {
go func(col Collector, index int) {
col.collect(ch, c.sharedInfo)
log.Printf("col index is %d", index) // I add
wg.Done()
}(col, i)
i++
}
wg.Wait()
}
then I restore the code, the trouble still occur.
I print waitGroup var, didn't find any reason
@berkaroad, We did some refactoring, please try our latest version https://github.com/Cambricon/mlu-exporter/releases/tag/v1.6.7
We can't reproduce on the new version, welcome to try on the new version and submit an issue @berkaroad
1. Issue or feature description
panic: sync: negative WaitGroup counter
goroutine 86 [running]: sync.(WaitGroup).Add(0xc000514220, 0xffffffffffffffff) sync/waitgroup.go:74 +0x139 sync.(WaitGroup).Done(...) sync/waitgroup.go:99 github.com/Cambricon/mlu-exporter/pkg/collector.(Collectors).Collect.func1(0xc0004fa0c0, 0xc000270fc0, 0xc000514220, 0xc7f280, 0xc000270f90) github.com/Cambricon/mlu-exporter@/pkg/collector/collector.go:100 +0x61 created by github.com/Cambricon/mlu-exporter/pkg/collector.(Collectors).Collect github.com/Cambricon/mlu-exporter@/pkg/collector/collector.go:98 +0x177
2. Steps to reproduce the issue
1) build image
./build_image.sh
2) run container
docker run -d \ -p 30108:30108 \ --privileged=true \ cambricon-mlu-exporter:v1.5.3
3) open browser to request http://localhost:30108/metrics 3 times
my environment:
1) linux x86_64, MLU270
2) card: MLU270-X5K
3) cnmon: CNMON 1.20.1