bottlerocket-os / bottlerocket

An operating system designed for hosting containers
https://bottlerocket.dev
Other
8.78k stars 519 forks source link

`dockershim.sock` symlink should be relative #4074

Open Nuru opened 4 months ago

Nuru commented 4 months ago

Image I'm using:

Bottlerocket OS 1.20.2 (aws-k8s-1.29)

What I expected to happen:

I expected /run/dockershim.sock to be a valid socket.

What actually happened:

In the Datadog Agent Pod, they mount the host filesystem under /host. They then expect to be able to connect to the Docker daemon via /host/run/dockershim.sock. Unfortunately, /run/dockershim.sock is an absolute link to /run/containerd/containerd.sock (See #2173), which is broken in the mounted file system.

Proposed Solution:

Make /run/dockershim.sock a relative link to ./containerd/containerd.sock instead of an absolute link.

Note that /var/run/dockershim.sock is already a relative link: ./containerd/containerd.sock

How to reproduce the problem:

Deploy Datadog Helm chart 3.66.0 to EKS running Bottlerocket and configure according to Datadog docs with

criSocketPath: /run/dockershim.sock

View logs from DaemonSet datadog Pod, container agent, and see

CORE | ERROR | (pkg/util/containerd/containerd_util.go:109 in NewContainerdUtil) | Containerd init error: temporary failure in containerdutil, will retry later: failed to dial "/host/run/dockershim.sock": context deadline exceeded

Alternately, use kubectl exec into the agent container to run file /host/run/dockershim.sock and see the error:

/host/run/dockershim.sock: broken symbolic link to /run/containerd/containerd.sock
yeazelm commented 4 months ago

Thanks for cutting this @Nuru. Do you know if this worked in a previous version of the helm chart? I noticed that they made a recent change https://github.com/DataDog/helm-charts/issues/1352 but probably didn't impact this. Nonetheless, I think making this link relative should work. I'll give this a shot to see if it helps and report back!

Nuru commented 4 months ago

Do you know if this worked in a previous version of the helm chart?

This setting is not in the Datadog Helm chart, it is in their documentation. The relevant part of their Helm chart has not changed in 3 years.

yeazelm commented 4 months ago

I was able to try out a change that does fix the symlink issue. I don't have a working Datadog setup to confirm that this fully fixes it but I can confirm the link works now:

# file /host/run/dockershim.sock
/host/run/dockershim.sock: symbolic link to ./containerd/containerd.sock

And the nodes with this relative link don't have the error message:

CORE | ERROR | (pkg/util/containerd/containerd_util.go:109 in NewContainerdUtil) | Containerd init error: temporary failure in containerdutil, will retry later: failed to dial "/host/run/dockershim.sock": context deadline exceeded

I'll get a PR cut shortly with this proposed fix.

yeazelm commented 4 months ago

https://github.com/bottlerocket-os/bottlerocket-core-kit/pull/18 Should hopefully fix this issue when released!