Open Nuru opened 4 months ago
Thanks for cutting this @Nuru. Do you know if this worked in a previous version of the helm chart? I noticed that they made a recent change https://github.com/DataDog/helm-charts/issues/1352 but probably didn't impact this. Nonetheless, I think making this link relative should work. I'll give this a shot to see if it helps and report back!
Do you know if this worked in a previous version of the helm chart?
This setting is not in the Datadog Helm chart, it is in their documentation. The relevant part of their Helm chart has not changed in 3 years.
I was able to try out a change that does fix the symlink issue. I don't have a working Datadog setup to confirm that this fully fixes it but I can confirm the link works now:
# file /host/run/dockershim.sock
/host/run/dockershim.sock: symbolic link to ./containerd/containerd.sock
And the nodes with this relative link don't have the error message:
CORE | ERROR | (pkg/util/containerd/containerd_util.go:109 in NewContainerdUtil) | Containerd init error: temporary failure in containerdutil, will retry later: failed to dial "/host/run/dockershim.sock": context deadline exceeded
I'll get a PR cut shortly with this proposed fix.
https://github.com/bottlerocket-os/bottlerocket-core-kit/pull/18 Should hopefully fix this issue when released!
Image I'm using:
Bottlerocket OS 1.20.2 (aws-k8s-1.29)
What I expected to happen:
I expected
/run/dockershim.sock
to be a valid socket.What actually happened:
In the Datadog Agent Pod, they mount the host filesystem under
/host
. They then expect to be able to connect to the Docker daemon via/host/run/dockershim.sock
. Unfortunately,/run/dockershim.sock
is an absolute link to/run/containerd/containerd.sock
(See #2173), which is broken in the mounted file system.Proposed Solution:
Make
/run/dockershim.sock
a relative link to./containerd/containerd.sock
instead of an absolute link.Note that
/var/run/dockershim.sock
is already a relative link:./containerd/containerd.sock
How to reproduce the problem:
Deploy Datadog Helm chart 3.66.0 to EKS running Bottlerocket and configure according to Datadog docs with
View logs from DaemonSet
datadog
Pod, containeragent
, and seeAlternately, use
kubectl exec
into theagent
container to runfile /host/run/dockershim.sock
and see the error: