DataDog / datadog-agent

Main repository for Datadog Agent
https://docs.datadoghq.com/
Apache License 2.0
2.74k stars 1.17k forks source link

[BUG] nil pointer dereference in "pkg/clusteragent/clusterchecks".removeConfig #17403

Closed nairb774 closed 9 months ago

nairb774 commented 1 year ago

Agent Environment

AWS EKS 1.23

# agent version
Cluster Agent 7.44.1 - Commit: 299bdcd - Serialization version: v5.0.76 - Go version: go1.19.8

Describe what happened: Agent crashes with the following stack:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x208c53a]

goroutine 517 [running]:
github.com/DataDog/datadog-agent/pkg/clusteragent/clusterchecks.(*dispatcher).removeConfig(0xc001e84080, {0xc00306a3a0, 0x10})
    /go/src/github.com/DataDog/datadog-agent/pkg/clusteragent/clusterchecks/dispatcher_configs.go:99 +0x23a
github.com/DataDog/datadog-agent/pkg/clusteragent/clusterchecks.(*dispatcher).remove(_, {{0xc0018f0180, 0x16}, {0xc000982978, 0x1, 0x1}, {0xc001885968, 0x3, 0x8}, {0x0, ...}, ...})
    /go/src/github.com/DataDog/datadog-agent/pkg/clusteragent/clusterchecks/dispatcher_main.go:149 +0xee
github.com/DataDog/datadog-agent/pkg/clusteragent/clusterchecks.(*dispatcher).Unschedule(0xc0023fb7f8?, {0xc002759600?, 0x1, 0x1?})
    /go/src/github.com/DataDog/datadog-agent/pkg/clusteragent/clusterchecks/dispatcher_main.go:120 +0x4e5
github.com/DataDog/datadog-agent/pkg/autodiscovery/scheduler.(*MetaScheduler).Unschedule(0xc000e14eb8, {0xc002759600?, 0x1, 0x1})
    /go/src/github.com/DataDog/datadog-agent/pkg/autodiscovery/scheduler/meta.go:94 +0x131
github.com/DataDog/datadog-agent/pkg/autodiscovery.(*AutoConfig).applyChanges(0xc0014977c0, {{0x0, 0x0, 0x0}, {0xc002759600, 0x1, 0x1}})
    /go/src/github.com/DataDog/datadog-agent/pkg/autodiscovery/autoconfig.go:452 +0x3e5
github.com/DataDog/datadog-agent/pkg/autodiscovery.(*AutoConfig).processDelService(0xc0014977c0, {0x3bd4230, 0xc001e1b080}, {0x3be8ac0?, 0xc0000cc7d0?})
    /go/src/github.com/DataDog/datadog-agent/pkg/autodiscovery/autoconfig.go:427 +0x165
github.com/DataDog/datadog-agent/pkg/autodiscovery.(*AutoConfig).serviceListening(0xc0014977c0)
    /go/src/github.com/DataDog/datadog-agent/pkg/autodiscovery/autoconfig.go:114 +0x29a
created by github.com/DataDog/datadog-agent/pkg/autodiscovery.NewAutoConfig
    /go/src/github.com/DataDog/datadog-agent/pkg/autodiscovery/autoconfig.go:69 +0x5b

Describe what you expected: Agent to not crash.

Steps to reproduce the issue: Unsure. Seems to have been happening sporadically for about a week now, across a number of clusters/VPCs/AWS accounts. Not sure what changed which might have started this. We've been running the same agent for about 2 weeks, but only seeing crashes start about a week ago.

Help in collecting additional information is would be useful.

Additional environment details (Operating System, Cloud provider, etc): AWS EKS 1.23 & Bottlerocket OS

nairb774 commented 1 year ago

Hello, just wanted to follow up and see if there is anything I can do to help move this along. The agent tends to crash often enough it is triggering pages which, as you can imagine, is not a great experience especially when there isn't all that much that can be done. Thanks.