===============
Agent (v7.27.0)
===============
Status date: 2021-10-14 14:07:15.242 UTC (1634220435242)
Agent start: 2021-10-14 10:52:17.395 UTC (1634208737395)
Pid: 1
Go Version: go1.14.12
Python Version: 3.8.5
Build arch: amd64
Agent flavor: agent
Check Runners: 4
Log Level: INFO
Paths
=====
Config File: /etc/datadog-agent/datadog.yaml
conf.d: /etc/datadog-agent/conf.d
checks.d: /etc/datadog-agent/checks.d
Clocks
======
NTP offset: -7.125ms
System time: 2021-10-14 14:07:15.242 UTC (1634220435242)
Host Info
=========
bootTime: 2021-10-14 10:50:50 UTC (1634208650000)
kernelArch: x86_64
kernelVersion: 5.4.120+
os: linux
platform: debian
platformFamily: debian
platformVersion: bullseye/sid
procs: 966
uptime: 1m32s
virtualizationRole: guest
Describe what happened:
Some of our containers got stuck in Removal In Progress state and are being left there without name. Probably during check:docker there is no check if the container name is nil here. Agent container ends in CrashLoopBackOff state and is restarting over and over. In agent's logs I can see this:
2021-10-14 14:18:39 UTC | CORE | INFO | (pkg/collector/runner/runner.go:261 in work) | check:docker | Running check
panic: runtime error: index out of range [0] with length 0
goroutine 466 [running]:
github.com/DataDog/datadog-agent/pkg/util/docker.(*DockerUtil).dockerContainers(0xc000c102d0, 0xc0021a591e, 0x0, 0x0, 0x0, 0x0, 0x0)
/.omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/util/docker/containers.go:231 +0x120f
github.com/DataDog/datadog-agent/pkg/util/docker.(*DockerUtil).ListContainers(0xc000c102d0, 0xc000fc391e, 0x0, 0x0, 0x0, 0x1672e25, 0x5a25c4c)
/.omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/util/docker/containers.go:46 +0x11e
github.com/DataDog/datadog-agent/pkg/collector/corechecks/containers/docker.(*DockerCheck).Run(0xc000892460, 0x2e20ed590, 0x63cfa00)
/.omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/collector/corechecks/containers/docker/docker.go:130 +0x2f1
github.com/DataDog/datadog-agent/pkg/collector/runner.(*Runner).work(0xc0009397d0)
/.omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/collector/runner/runner.go:270 +0x4bc
created by github.com/DataDog/datadog-agent/pkg/collector/runner.(*Runner).AddWorker
/.omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/collector/runner/runner.go:100 +0x8a
Describe what you expected:
no matter if there is or is not a name of container, I expect normal start of datadog-agent
Steps to reproduce the issue:
Still investigating why some of containers ends up in Removal In Progress state, so no exact steps to reproduce.
Output of the info page (if this is a bug)
Describe what happened: Some of our containers got stuck in
Removal In Progress
state and are being left there without name. Probably duringcheck:docker
there is no check if the container name is nil here. Agent container ends inCrashLoopBackOff
state and is restarting over and over. In agent's logs I can see this:Describe what you expected: no matter if there is or is not a name of container, I expect normal start of datadog-agent
Steps to reproduce the issue: Still investigating why some of containers ends up in
Removal In Progress
state, so no exact steps to reproduce.Additional environment details (Operating System, Cloud provider, etc): GKE on GCP