DataDog / datadog-agent

Main repository for Datadog Agent
https://docs.datadoghq.com/
Apache License 2.0
2.88k stars 1.21k forks source link

panic: runtime error: index out of range [0] with length 0 │ #9516

Closed ajkost closed 2 years ago

ajkost commented 3 years ago

Output of the info page (if this is a bug)

===============
Agent (v7.27.0)
===============

  Status date: 2021-10-14 14:07:15.242 UTC (1634220435242)
  Agent start: 2021-10-14 10:52:17.395 UTC (1634208737395)
  Pid: 1
  Go Version: go1.14.12
  Python Version: 3.8.5
  Build arch: amd64
  Agent flavor: agent
  Check Runners: 4
  Log Level: INFO

  Paths
  =====
    Config File: /etc/datadog-agent/datadog.yaml
    conf.d: /etc/datadog-agent/conf.d
    checks.d: /etc/datadog-agent/checks.d

  Clocks
  ======
    NTP offset: -7.125ms
    System time: 2021-10-14 14:07:15.242 UTC (1634220435242)

  Host Info
  =========
    bootTime: 2021-10-14 10:50:50 UTC (1634208650000)
    kernelArch: x86_64
    kernelVersion: 5.4.120+
    os: linux
    platform: debian
    platformFamily: debian
    platformVersion: bullseye/sid
    procs: 966
    uptime: 1m32s
    virtualizationRole: guest

Describe what happened: Some of our containers got stuck in Removal In Progress state and are being left there without name. Probably during check:docker there is no check if the container name is nil here. Agent container ends in CrashLoopBackOff state and is restarting over and over. In agent's logs I can see this:

 2021-10-14 14:18:39 UTC | CORE | INFO | (pkg/collector/runner/runner.go:261 in work) | check:docker | Running check                                                                                             
 panic: runtime error: index out of range [0] with length 0                                                                                                                                                      
 goroutine 466 [running]:                                                                                                                                                                                        
 github.com/DataDog/datadog-agent/pkg/util/docker.(*DockerUtil).dockerContainers(0xc000c102d0, 0xc0021a591e, 0x0, 0x0, 0x0, 0x0, 0x0)                                                                            
     /.omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/util/docker/containers.go:231 +0x120f                                                                                                  
 github.com/DataDog/datadog-agent/pkg/util/docker.(*DockerUtil).ListContainers(0xc000c102d0, 0xc000fc391e, 0x0, 0x0, 0x0, 0x1672e25, 0x5a25c4c)                                                                  
     /.omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/util/docker/containers.go:46 +0x11e                                                                                                    
 github.com/DataDog/datadog-agent/pkg/collector/corechecks/containers/docker.(*DockerCheck).Run(0xc000892460, 0x2e20ed590, 0x63cfa00)                                                                            
     /.omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/collector/corechecks/containers/docker/docker.go:130 +0x2f1                                                                            
 github.com/DataDog/datadog-agent/pkg/collector/runner.(*Runner).work(0xc0009397d0)                                                                                                                              
     /.omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/collector/runner/runner.go:270 +0x4bc                                                                                                  
 created by github.com/DataDog/datadog-agent/pkg/collector/runner.(*Runner).AddWorker                                                                                                                            
     /.omnibus/src/datadog-agent/src/github.com/DataDog/datadog-agent/pkg/collector/runner/runner.go:100 +0x8a

image Describe what you expected: no matter if there is or is not a name of container, I expect normal start of datadog-agent

Steps to reproduce the issue: Still investigating why some of containers ends up in Removal In Progress state, so no exact steps to reproduce.

Additional environment details (Operating System, Cloud provider, etc): GKE on GCP

errriclee commented 2 years ago

We are experiencing this problem as well. We are running ECS in AWS, but the only affected instances are doing Docker-in-Docker for builds.