DataDog / datadog-agent

Main repository for Datadog Agent
https://docs.datadoghq.com/
Apache License 2.0
2.86k stars 1.2k forks source link

docker agent [auto-] configuration for agent v6 #1381

Open n0mer opened 6 years ago

n0mer commented 6 years ago

hello,

after migration from dd-agent v5 i've got 2 folders in /etc/datadog-agent/conf.d: docker.d and docker_daemon.d

Collector says there are following Loading Errorsin docker_daemon:

docker-daemon.yaml in v5 had very simple configuration:

init_config:

instances:
  - ## Daemon and system configuration
    url: "unix://var/run/docker.sock"
    new_tag_names: true

So, there are several post-migration questions:

init_config:

instances:
  - ## The agent honors the DOCKER_HOST, DOCKER_CERT_PATH and DOCKER_TLS_VERIFY
    url: "unix://var/run/docker.sock"
    new_tag_names: true

    collect_container_size: true
    collect_images_stats: true
    collect_image_size: true
    collect_disk_stats: true
    collect_exit_codes: true

image

host "dashboard" contents:

image image

n0mer commented 6 years ago

got this after turning DEBUG logging:

2018-03-02 02:21:40 CET | DEBUG | (loader.go:88 in Load) | Unable to load python module - datadog_checks.docker: No module named docker
2018-03-02 02:21:40 CET | DEBUG | (loader.go:88 in Load) | Unable to load python module - docker: No module named docker
2018-03-02 02:21:40 CET | DEBUG | (autoconfig.go:487 in getChecks) | Python Check Loader: unable to load the check 'docker': No module named docker
2018-03-02 02:21:40 CET | DEBUG | (autoconfig.go:476 in getChecks) | Core Check Loader: successfully loaded check 'docker'
2018-03-02 02:21:40 CET | WARN | (check.go:243 in Configure) | could not get a check instance with the new api: __init__() takes exactly 5 arguments (4 given)
2018-03-02 02:21:40 CET | WARN | (check.go:244 in Configure) | trying to instantiate the check with the old api, passing agentConfig to the constructor
2018-03-02 02:21:40 CET | WARN | (check.go:269 in Configure) | passing `agentConfig` to the constructor is deprecated, please use the `get_config` function from the 'datadog_agent' package (http_check).
n0mer commented 6 years ago

from agent.log

2018-03-02 02:31:26 CET | DEBUG | (datadog_agent.go:139 in LogMessage) | (disk.py:142) | _exclude_disk: /dev/vda1, ext4, /var/lib/docker/plugins
2018-03-02 02:31:26 CET | DEBUG | (datadog_agent.go:139 in LogMessage) | (disk.py:142) | _exclude_disk: /dev/vda1, ext4, /var/lib/docker/aufs
2018-03-02 02:31:26 CET | DEBUG | (datadog_agent.go:139 in LogMessage) | (disk.py:142) | _exclude_disk: , aufs, /var/lib/docker/aufs/mnt/fa07eca91c50fa767b5317e2b61f12400cbefffd2a74bf6c9901b8fd0f741a24
2018-03-02 02:31:26 CET | DEBUG | (datadog_agent.go:139 in LogMessage) | (disk.py:142) | _exclude_disk: nsfs, nsfs, /run/docker/netns/default
2018-03-02 02:31:26 CET | WARN | (datadog_agent.go:135 in LogMessage) | (disk.py:104) | Unable to get disk metrics for /run/docker/netns/default: [Errno 13] Permission denied: '/run/docker/netns/default'
2018-03-02 02:31:26 CET | DEBUG | (datadog_agent.go:139 in LogMessage) | (disk.py:142) | _exclude_disk: shm, tmpfs, /var/lib/docker/containers/1447584bef070dd23155717c9b2d0cf10c1f31c1eef97d8b78a21e277e872c14/shm
2018-03-02 02:31:26 CET | DEBUG | (datadog_agent.go:139 in LogMessage) | (disk.py:142) | _exclude_disk: , aufs, /var/lib/docker/aufs/mnt/a4068e9ef162687262715c0ca56508082e8cb36477bf2a2fe64e3043e14e2153
2018-03-02 02:31:26 CET | DEBUG | (datadog_agent.go:139 in LogMessage) | (disk.py:142) | _exclude_disk: shm, tmpfs, /var/lib/docker/containers/046ded549ef697e72181451cebf0fb8a08ea9c3ce89c8106400e789b4cacfeac/shm
2018-03-02 02:31:26 CET | DEBUG | (datadog_agent.go:139 in LogMessage) | (disk.py:142) | _exclude_disk: , aufs, /var/lib/docker/aufs/mnt/bb5376de41cc3608551412cff7a46e0937f6ea39b714dff253aaa4f03fa6ab38
2018-03-02 02:31:26 CET | DEBUG | (datadog_agent.go:139 in LogMessage) | (disk.py:142) | _exclude_disk: shm, tmpfs, /var/lib/docker/containers/2e5d2a551c553ef46d646d84a244a95ffdfedbbc624d529e2b62323160cd806c/shm
2018-03-02 02:31:26 CET | DEBUG | (datadog_agent.go:139 in LogMessage) | (disk.py:142) | _exclude_disk: , aufs, /var/lib/docker/aufs/mnt/7912f4ec8c0235eb64c9c5c3bb19f563438b8572aa68a337dc83f662ebdf4663
2018-03-02 02:31:26 CET | DEBUG | (datadog_agent.go:139 in LogMessage) | (disk.py:142) | _exclude_disk: shm, tmpfs, /var/lib/docker/containers/1fc09ad3b9f71ead0c60273c7ea7f5a3fc05cb2e24155a2e623f8140d2d4d92e/shm
2018-03-02 02:31:26 CET | INFO | (runner.go:246 in work) | Running check docker
2018-03-02 02:31:26 CET | DEBUG | (job.go:99 in waitForTick) | Enqueuing check docker for queue 15000000000
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '12:pids:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '8:memory:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '7:blkio:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '4:cpu,cpuacct:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '2:devices:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '1:name=systemd:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '0::/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '12:pids:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '8:memory:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '7:blkio:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '4:cpu,cpuacct:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '2:devices:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '1:name=systemd:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '0::/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '12:pids:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '8:memory:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '7:blkio:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '4:cpu,cpuacct:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '2:devices:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '1:name=systemd:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '0::/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '0::/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '12:pids:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '8:memory:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '7:blkio:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '4:cpu,cpuacct:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '2:devices:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '1:name=systemd:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '0::/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '0::/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '12:pids:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '8:memory:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '7:blkio:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '4:cpu,cpuacct:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '2:devices:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '1:name=systemd:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '0::/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '0::/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '12:pids:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '8:memory:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '7:blkio:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '4:cpu,cpuacct:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '2:devices:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '1:name=systemd:/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '0::/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (cgroup.go:583 in parseCgroupPaths) | could not parse container id from path '0::/system.slice/docker.service'
2018-03-02 02:31:26 CET | DEBUG | (docker_util.go:292 in Containers) | Container id 1fc09ad3b9f7 has an empty cgroup, skipping
2018-03-02 02:31:26 CET | DEBUG | (docker_util.go:292 in Containers) | Container id 2e5d2a551c55 has an empty cgroup, skipping
2018-03-02 02:31:26 CET | DEBUG | (docker_util.go:292 in Containers) | Container id 046ded549ef6 has an empty cgroup, skipping
2018-03-02 02:31:26 CET | DEBUG | (docker_util.go:292 in Containers) | Container id 1447584bef07 has an empty cgroup, skipping
2018-03-02 02:31:26 CET | INFO | (runner.go:302 in work) | Done running check docker
xvello commented 6 years ago

Hi @n0mer ,

The docker_daemon check is deprecated and replaced by the docker check. Could you please describe your migration path? docker_daemon.d should not be automatically copied by the migration command.

You docker.d config looks OK, what could happen is:

n0mer commented 6 years ago

@xvello Xavier, i executed command

 DD_UPGRADE=true bash -c "$(curl -L https://raw.githubusercontent.com/DataDog/datadog-agent/master/cmd/agent/install_script.sh)"

from https://github.com/DataDog/datadog-agent/blob/master/docs/agent/upgrade.md , and i got 2 folders docker.d and docker_daemon.d

n0mer commented 6 years ago

@xvello are those error log messages _exclude_disk , could not parse container id from path and Container id ... has an empty cgroup, skipping irrelevant, and can be ignored?

so, i removed docker_daemon.d (so only docker.d is left), set collect_container_size: false - still no luck.

Anyway, docker agent working with collect_container_size: true on another server, so this might not be a problem.

I opened support case #132559 , submitted flares with configs and logs. Brian B. is looking into it.

n0mer commented 6 years ago

@xvello i'm running datadog agent w/out docker image

# ps auxww | grep dd-agent
dd-agent 29545  2.7  0.8 1077968 65860 ?       Ssl  17:17   0:05 /opt/datadog-agent/bin/agent/agent start -p /opt/datadog-agent/run/agent.pid
dd-agent 29546  0.3  0.2 780084 23100 ?        Ssl  17:17   0:00 /opt/datadog-agent/embedded/bin/trace-agent --config /etc/datadog-agent/datadog.yaml --pid /opt/datadog-agent/run/trace-agent.pid
dd-agent 29547  0.6  0.3  44944 28772 ?        Ssl  17:17   0:01 /opt/datadog-agent/embedded/bin/process-agent --config=/etc/datadog-agent/datadog.yaml --pid=/opt/datadog-agent/run/process-agent.pid
dd-agent 29644  6.2  2.0 3668924 164228 ?      Sl   17:17   0:12 java -Xmx200m -Xms50m -classpath /opt/datadog-agent/bin/agent/dist/jmx/jmxfetch-0.18.2-jar-with-dependencies.jar org.datadog.jmxfetch.App --ipc_host localhost --ipc_port 5001 --check_period 15000 --log_level DEBUG --reporter statsd:localhost:8125 collect
# mount | grep group
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
lxcfs on /var/lib/lxcfs type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
# mount | grep proc
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=29,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=12273)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)

i had tmpfs excluded in conf.d/disk.d/conf.yaml , now tmpfs is not excluded (please also notice use_mount:yes, dunno whether it can affect docker running containers meta collection):

init_config:

instances:
  # The use_mount parameter will instruct the check to collect disk
  # and fs metrics using mount points instead of volumes
  - use_mount: yes
    # The (optional) excluded_filesystems parameter will instruct the check to
    # ignore disks using these filesystems. Note: On some linux distributions,
    # rootfs will be found and tagged as a device, add rootfs here to exclude.
    excluded_filesystems:
#      - tmpfs
#      - none
#      - shm
#      - nsfs
#      - tracefs