XiaoMi / open-falcon

A Distributed and High-Performance Monitoring System
http://open-falcon.com/
3.03k stars 833 forks source link

Agent panic #44

Closed IveCode closed 7 years ago

IveCode commented 7 years ago

在网络波动的情况下,引发的异常: panic(0x780420, 0xc4200120c0) /usr/local/go/src/runtime/panic.go:500 +0x1a1 github.com/open-falcon/agent/g.(*SingleConnRpcClient).Call(0x0, 0x7ead8c, 0xf, 0x758620, 0xc42012c000, 0x7771c0, 0xc420334570, 0x0, 0x0) /home/sofeng/gowork/src/github.com/open-falcon/agent/g/rpc.go:58 +0x65 github.com/open-falcon/agent/g.updateMetrics(0xc420138540, 0x13, 0xc420398000, 0x21, 0x40, 0xc420334570, 0xc420016000) /home/sofeng/gowork/src/github.com/open-falcon/agent/g/transfer.go:50 +0x15c github.com/open-falcon/agent/g.SendMetrics(0xc420398000, 0x21, 0x40, 0xc420334570) /home/sofeng/gowork/src/github.com/open-falcon/agent/g/transfer.go:24 +0x14e github.com/open-falcon/agent/g.SendToTransfer(0xc420398000, 0x21, 0x40) /home/sofeng/gowork/src/github.com/open-falcon/agent/g/var.go:60 +0xd9 github.com/open-falcon/agent/cron.collect(0x3c, 0xc42013a038, 0x1, 0x1) /home/sofeng/gowork/src/github.com/open-falcon/agent/cron/collector.go:73 +0x3df created by github.com/open-falcon/agent/cron.Collect /home/sofeng/gowork/src/github.com/open-falcon/agent/cron/collector.go:30 +0xb2

是由于TransferLock锁的问题,问题在SendMetrics函数中。 分析原因: 如果两个goroutine都获取到同一个addr,一个goroutine在updateMetrics函数处获取TransferLock.RLock(),另一个goroutine在closeTransferClient函数处获取TransferLock.Lock()。 closeTransferClient函数先执行完,那么updateMetrics函数就会引发异常。所以在updateMetrics函数中需要判断,addr是否存在map中。 func updateMetrics(addr string, metrics []model.MetricValue, resp model.TransferResponse) bool { TransferLock.RLock() defer TransferLock.RUnlock() if _, ok := TransferClients[addr]; ok { err := TransferClients[addr].Call("Transfer.Update", metrics, resp) if err != nil { log.Println("call Transfer.Update fail", addr, err) return false } }

return true

}

yubo commented 7 years ago

用新版试一试,updateMetrics()没有找到