docker-library / docker

Docker Official Image packaging for Docker
Apache License 2.0
1.09k stars 568 forks source link

Occasionally dind fails to start: "sed: write error" #495

Open rousku opened 1 month ago

rousku commented 1 month ago

The issue can be reproduced with the following script. It takes some time for the issue to appear.

#!/bin/bash

while true; do
   DIND_CONTAINER_ID=$(docker run -t --privileged -d docker:26.1.2-dind)
   echo $DIND_CONTAINER_ID
   while ! docker exec "$DIND_CONTAINER_ID" docker info | grep "Server Version: 26.1.2"; do
    sleep 1
   done
   docker stop $DIND_CONTAINER_ID
   docker rm $DIND_CONTAINER_ID
done
.
.
.
fc7502cfe87e48cfd464d4a5713f2efaf5e2b4341d5a13d5381c324ff80ec8df
ERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
errors pretty printing info
ERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
errors pretty printing info
 Server Version: 26.1.2
fc7502cfe87e48cfd464d4a5713f2efaf5e2b4341d5a13d5381c324ff80ec8df
fc7502cfe87e48cfd464d4a5713f2efaf5e2b4341d5a13d5381c324ff80ec8df
4940ab34359e57a661bc80bfa6aa9afa2de2cf9fc2b2609a0ec595044e3c314e
ERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
errors pretty printing info
ERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
errors pretty printing info
ERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
errors pretty printing info
Error response from daemon: container 4940ab34359e57a661bc80bfa6aa9afa2de2cf9fc2b2609a0ec595044e3c314e is not running
Error response from daemon: container 4940ab34359e57a661bc80bfa6aa9afa2de2cf9fc2b2609a0ec595044e3c314e is not running
Error response from daemon: container 4940ab34359e57a661bc80bfa6aa9afa2de2cf9fc2b2609a0ec595044e3c314e is not running
Error response from daemon: container 4940ab34359e57a661bc80bfa6aa9afa2de2cf9fc2b2609a0ec595044e3c314e is not running
Error response from daemon: container 4940ab34359e57a661bc80bfa6aa9afa2de2cf9fc2b2609a0ec595044e3c314e is not running
.
.
.
$ docker logs 4940ab34359e57a661
Certificate request self-signature ok
subject=CN = docker:dind server
/certs/server/cert.pem: OK
Certificate request self-signature ok
subject=CN = docker:dind client
/certs/client/cert.pem: OK
cat: can't open '/proc/net/ip6_tables_names': No such file or directory
cat: can't open '/proc/net/arp_tables_names': No such file or directory
iptables v1.8.10 (nf_tables)
sed: write error
tianon commented 1 month ago

Oh, your case is slightly different from the one I commented in https://github.com/docker-library/docker/issues/308#issuecomment-2115722582 -- in your case, I think it might actually be your docker exec that's causing the problem (since you're creating processes while the script is still trying to initialize and thus exacerbating the inherent race between the lines of the dind script trying to set up the cgroup appropriately).

tianon commented 1 month ago

What I might suggest instead is putting /run in a shared volume and using docker run for your docker info checks instead of docker exec (connecting to the socket from a second container instead of going into the first).