FDio / govpp

Go toolset for the VPP.
Apache License 2.0
192 stars 81 forks source link

panic in the stats element #163

Closed glazychev-art closed 8 months ago

glazychev-art commented 10 months ago

Hello, We are facing a panic issue in the stats element:

panic: runtime error: index out of range [9] with length 9

goroutine 11380 [running]:
go.fd.io/govpp/adapter/statsclient.(*statSegmentV2).UpdateEntryData(0x430?, 0x7f62c4c3aa60, 0xc000cfdff8)
    /root/go/pkg/mod/go.fd.io/govpp@v0.8.0/adapter/statsclient/statseg_v2.go:354 +0x77c
go.fd.io/govpp/adapter/statsclient.(*StatsClient).updateStatOnIndex(0xc000200690, 0xc000cfdfc8, 0xc000fde3e8?)
    /root/go/pkg/mod/go.fd.io/govpp@v0.8.0/adapter/statsclient/statsclient.go:633 +0x171
go.fd.io/govpp/adapter/statsclient.(*StatsClient).UpdateDir(0xc000200690, 0xc0008ecba0)
    /root/go/pkg/mod/go.fd.io/govpp@v0.8.0/adapter/statsclient/statsclient.go:294 +0x1e7
go.fd.io/govpp/core.(*StatsConnection).updateStats.func1()
    /root/go/pkg/mod/go.fd.io/govpp@v0.8.0/core/stats.go:220 +0x5b
go.fd.io/govpp/core.(*StatsConnection).updateStats(0xc000fde638?, 0x6dce3e?, {0xc00096b1a0?, 0xc00036c3f0?, 0x1488630?})
    /root/go/pkg/mod/go.fd.io/govpp@v0.8.0/core/stats.go:231 +0xc4
go.fd.io/govpp/core.(*StatsConnection).GetInterfaceStats(0xc000912c60, 0xc000fde818)
    /root/go/pkg/mod/go.fd.io/govpp@v0.8.0/core/stats.go:385 +0x7e

Most likely, interfaces are created in parallel. Or probably GetInterfaceStats is called from different goroutines.

Do you have any ideas how to fix this?

ondrej-fabry commented 10 months ago

Hello, thanks for the report. Please include information about the VPP version. Also, if possible perhaps including dump using vpp_get_stats for the retrieved data.

ondrej-fabry commented 10 months ago

I have not done any code investigation, but from quick glance, my assumption of what is happening here is that some interface got deleted during the update of stats entry data.

This commit is possibly related to this: https://github.com/FDio/vpp/commit/ff27c9f8ecb08ff0f5b98332975733e8bab125f1

I see two solutions to be explored:

  1. Catch the panic at the updateStats in the core package and re-allocate the memory with interface stats data. This should avoid any future panics when access invalid data from stats segment.
  2. Check the length of data on each read out of stats data, but this might affect performance when retrieving a large amount of stats data.
glazychev-art commented 10 months ago

Thanks! We use this VPP version - https://github.com/FDio/vpp/tree/1765f014bc7fcc3b924019ec96350eb50bef629f

I'm not sure, but I think that the creation of the interface occurred at the time the statistics were received. It seems we didn't remove the interfaces at that point

glazychev-art commented 10 months ago

Hello @ondrej-fabry,

Can't we just use vecLen = uint32(len(data)) instead of vecLen := *(*uint32)(vectorLen(dirVector)) ?

https://github.com/FDio/govpp/blob/v0.8.0/adapter/statsclient/statseg_v2.go#L340-L341

edwarnicke commented 9 months ago

@ondrej-fabry Any insight here?