Open JohnCgp opened 1 year ago
For anyone else that stumbles upon this, I spoke with support and it's not a bug - container log collection is not supported for rootless Podman containers as of Agent v7.48.0.
Seeing the same issue for general container metrics collection on RHEL8 with Podman - any update or solution here?
Agent Version:
Agent 7.52.0 - Commit: 4a6318a - Serialization version: v5.0.104 - Go version: go1.21.8
Error Log:
2024-03-27 22:31:16 UTC | PROCESS | DEBUG | (pkg/process/util/containers/containers.go:159 in GetContainers) | << redacted >> Runtime:podman RuntimeFlavor: State:{Running:true Status:running Health: CreatedAt:2024-03-27 17:46:57.046830482 -0400 -0400 StartedAt:2024-03-27 17:46:57.046830482 -0400 -0400 FinishedAt:0001-01-01 00:00:00 +0000 UTC ExitCode:<nil>} CollectorTags:[] Owner:<nil> SecurityContext:<nil> Resources:{CPURequest:<nil> MemoryRequest:<nil>}} not available, err: containerID not found
Agent Environment
Agent is run as a rootless podman container on RHEL 8 behind a corporate proxy:
Describe what happened:
Container logs are not being sent to Datadog;
container_collect_all
integration's "Status" is stuck as "Pending" (see bottom of this section).Cause may be that the agent is not able to collect the container stats because it is not correctly finding the container cgroups.
When running the agent with debug logging, an entry similar to this one is emitted for each container:
Trawling through the code,
collector.GetContainerStats
fails:https://github.com/DataDog/datadog-agent/blob/8e04ee8a7de780174deeb963b61f4ca6fc129462/pkg/process/util/containers/containers.go#L153-L158
Because the call to
c.getCgroup
on line 105 fails:https://github.com/DataDog/datadog-agent/blob/8e04ee8a7de780174deeb963b61f4ca6fc129462/pkg/util/containers/metrics/system/collector_linux.go#L104-L108
As it gets
nil
https://github.com/DataDog/datadog-agent/blob/8e04ee8a7de780174deeb963b61f4ca6fc129462/pkg/util/containers/metrics/system/collector_linux.go#L164-L167
Because the cgroup for the given container ID isn't in the collection:
https://github.com/DataDog/datadog-agent/blob/8e04ee8a7de780174deeb963b61f4ca6fc129462/pkg/util/cgroups/reader.go#L180
It seems that the
parseCgroups
function isn't able to extract the cgroup for the container for some reason.Doing a
find
for the container's ID does indeed return its cgroups. As it's running as a container,/sys/fs/cgroup
doesn't return any results, as expected. They are under/host/sys/fs/cgroup
.The system is using cgroupsv1. Note
/sys/fs/cgroup
mounted astmpfs
and/sys/fs/cgroup/systemd
mounted ascgroup
:container_collect_all
integration status is pending:Describe what you expected:
Logs from all the containers to be sent to Datadog (
containers_collect_all
status to be "OK" instead of pending).Steps to reproduce the issue:
Run agent as container as per first section. Tried both with and without
--cgroupns host --pid host
.Additional environment details (Operating System, Cloud provider, etc): RHEL 8