Open parkerderek opened 2 years ago
I have the exact same problem. But I installed using Ansible with the following vars
---
datadog_additional_groups:
- "systemd-journal"
datadog_config:
dogstatsd_non_local_traffic: true
logs_enabled: true
logs_config:
container_collect_all: false
use_http: true
auto_multi_line_detection: true
process_config:
enabled: 'true'
datadog_disable_untracked_checks: true
datadog_disable_default_checks: true
datadog_additional_checks:
- cpu
- disk
- file_handle
- io
- load
- memory
- network
- ntp
- uptime
datadog_checks:
disk:
init_config:
instances:
- use_mount: false
journald:
logs:
- type: journald
container_mode: true
Agent Environment Agent (v7.38.2) Go Version: go1.17.11 Python Version: 3.8.13 Build arch: amd64 Agent flavor: agent Check Runners: 4 Log Level: info
agent[1140]: 2022-09-05 09:56:39 GMT | CORE | WARN | (pkg/workloadmeta/store.go:359 in func1) | error pulling from collector "podman": error opening database /var/lib/containers/storage/libpod/bolt_state.db
process-agent[1141]: 2022-09-05 09:56:40 GMT | PROCESS | WARN | (pkg/workloadmeta/store.go:359 in func1) | error pulling from collector "podman": error opening database /var/lib/containers/storage/libpod/bolt_state.db
Additional environment details (Operating System, Cloud provider, etc): kernelArch: x86_64 kernelVersion: 4.18.0-372.19.1.el8_6.x86_64 os: linux platform: redhat platformFamily: rhel platformVersion: 8.6 virtualizationRole: host virtualizationSystem: vmware
Same. I had a fine working install but wanted to update. One of my servers was fine, the other had this issue. The one without issues is the same for agent version, linux version, etc.
DD_AGENT_MAJOR_VERSION=7 DD_API_KEY=
==> /var/log/datadog/process-agent.log <==
2022-09-07 23:40:19 UTC | PROCESS | WARN | (pkg/workloadmeta/store.go:359 in func1) | error pulling from collector "podman": error opening database /var/lib/containers/storage/libpod/bolt_state.db
Agent 7.38.2 - Commit: ba442fd - Serialization version: v5.0.23 - Go version: go1.17.11
CentOS Linux release 8.3.2011 x86_64 running on GCP and not in any kind of docker or anything
In the config, trying to follow the steps to exclude container detection does not seem to work container_exclude: name:. and container_exclude: "name:." both do not make any difference.
Neither upgrading nor removing podman changes anything. The file bolt_state.db does not even exist at this point.
https://docs.datadoghq.com/integrations/container/ -- adding a blank config per this and restarting changed nothing; it's still trying to open the file.
Removing the container config file found in the status command and restarting changes nothing (/etc/datadog-agent/conf.d/container.d/conf.yaml.default)
After adding
autoconfig_exclude_features:
- podman
to datadog.yaml so the config looks like this...
---
datadog_additional_groups:
- "systemd-journal"
datadog_config:
autoconfig_exclude_features:
- podman
dogstatsd_non_local_traffic: true
logs_enabled: true
logs_config:
container_collect_all: false
use_http: true
auto_multi_line_detection: true
process_config:
enabled: 'true'
datadog_disable_untracked_checks: true
datadog_disable_default_checks: true
datadog_additional_checks:
- cpu
- disk
- file_handle
- io
- load
- memory
- network
- ntp
- uptime
datadog_checks:
disk:
init_config:
instances:
- use_mount: false
journald:
logs:
- type: journald
container_mode: true
Now the error has stopped...
autoconfig_exclude_features: - podman
This prevents the error for me as well. Thanks so much!
Disabling the podman feature isn't an option for those folks who need the "container" integration to function properly on a host that runs containers. The root cause of this bug is the BoltDB Go client. Looking at the dd-agent source code in file pkg/util/podman/db_client.go, you can see the client is trying to open the file in read/write mode:
142 func (client *DBClient) getDBCon() (*bolt.DB, error) {
143 db, err := bolt.Open(client.DBPath, 0600, nil)
144 if err != nil {
145 return nil, fmt.Errorf("error opening database %s", client.DBPath)
146 }
147
148 return db, nil
149 }
I was forced to set the permissions on /var/lib/containers/storage/libpod/bolt_state.db to 666 to allow the dd-agent to have read/write access to the file. This not only fixes the error from being reported in the logs, but it also allows the agent to collect podman-managed containers running on the host.
I consider this to be a fairly substantial security risk and a bug and should be fixed. Alternatively, at least the "container" integration documentation should be updated to mention the read/write requirements to support podman.
666 is not helping me (
Dec 20 10:32:10 xxx agent[3907220]: 2022-12-20 10:32:10 UTC | CORE | WARN | (pkg/workloadmeta/store.go:362 in func1) | error pulling from collector "podman": error opening database /var/lib/containers/storage/libpod/bolt_state.db
$ sudo ls -l /var/lib/containers/storage/libpod/bolt_state.db
-rw-rw-rw- 1 root root 131072 Dec 19 23:55 /var/lib/containers/storage/libpod/bolt_state.db
@VitaliyKulikov if you have selinux enabled, that could be causing you problems. check syslog audit logs for hints that selinux is blocking the reads/write system calls.
@rsumner thanks for the tip. it is Ubuntu 22.04.1 LTS box, so apparmor
is there and I can't see any rules for such denial. also, I am using datadog agent v7.41.0.
Thank you so much! I have been fighting this with Redhat 8 servers, seemed to start with 8.6 and up.
We recently got this resolved through Request #1108661. Please refer to the ticket for more info.
Following are the two commands along with container.d/conf.yaml helped us in fixing the error and for DD to collect containers mentrics,
setfacl -R -m u:dd-agent:rx /var/lib/containers/
setfacl -R -m u:dd-agent:rwx /var/lib/containers/storage/libpod/bolt_state.db
Hope that helps.
@priyarajeshh Your support tickets are not publicly visible, so no one can refer to the ticket for details except you and Datadog. Can you provide details as to what changes you made to the containerd./conf.yaml
? The setfacl
option is definitely better than doing a basic chmod
-- thanks for relaying that info.
Agent Environment
Output from
datadog-agent status
: Agent (v7.38.2)\ Go Version: go1.17.11\ Python Version: 3.8.13\ Build arch: amd64\ Agent flavor: agent\ Check Runners: 4\ Log Level: infoDescribe what happened: When trying
systemctl status datadog-agent
;CORE | WARN
messages appear - such as(pkg/workloadmeta/store.go:359 in func1) | error pulling from collector "podman": error opening database /var/lib/containers/storage/libpod/bolt_state.db
Describe what you expected: Error messages to not appear, as it is a default install of datadog agent, with no podman configured or enabled, not running on container
Steps to reproduce the issue: Install default datadog
DD_AGENT_MAJOR_VERSION=7 DD_API_KEY="KEY" DD_SITE="datadoghq.com" bash -c "$(curl -L https://s3.amazonaws.com/dd-agent/scripts/install_script.sh)"
Additional environment details (Operating System, Cloud provider, etc): kernelArch: x86_64\ kernelVersion: 4.18.0-372.19.1.el8_6.x86_64\ os: linux\ platform: redhat\ platformFamily: rhel\ platformVersion: 8.6\ virtualizationRole: host\ virtualizationSystem: kvm