canonical / docker-snap

https://snapcraft.io/docker
MIT License
53 stars 27 forks source link

docker build dns resolution fails after a restart #66

Open lestephane opened 2 years ago

lestephane commented 2 years ago

Reproduction

$ docker build ...
...
Get "https://ghcr.io/v2/": dial tcp: lookup ghcr.io: Temporary failure in name resolution

snap stop --disable followed by snap start --enable does not help.

$ tree /var/snap/docker/current/
/var/snap/docker/current/
├── config
│   └── daemon.json
└── etc
    ├── docker
    │   └── key.json
    └── gitconfig
$ cat /var/snap/docker/current/config/daemon.json
{
    "log-level":        "error",
    "storage-driver":   "overlay2"
}
$ sudo systemctl status snap.docker.dockerd.service
● snap.docker.dockerd.service - Service for snap application docker.dockerd
     Loaded: loaded (/etc/systemd/system/snap.docker.dockerd.service; enabled; vendor preset: enabled)
     Active: active (running) since Wed 2022-03-16 19:52:28 EET; 10min ago
   Main PID: 117245 (dockerd)
      Tasks: 37 (limit: 18775)
     Memory: 76.0M
     CGroup: /system.slice/snap.docker.dockerd.service
             ├─117245 dockerd --group docker --exec-root=/run/snap.docker --data-root=/var/snap/docker/common/var-lib-docker ...
                ...(continued) --pidfile=/run/snap.docker/docker.pid --config-file=/var/snap/docker/1458/config/daemon.json
             └─117318 containerd --config /run/snap.docker/containerd/containerd.toml --log-level error
$ cat /var/snap/docker/1458/config/daemon.json
{
    "log-level":        "error",
    "storage-driver":   "overlay2"
}

Is there a way to tell dockerd to refresh its dns configuration? Or to inspect what it's current configuration is? There's no point in me defining a hardcoded dns in daemon.json, since it changes everytime I connect to a different VPN. But it's OK for me to restart the docker daemon if I knew how to tell it to "use the current dns from resolv.conf".

I suspect this error happens because when my laptop starts, there is a race condition between the vpn daemon and dockerd daemon. The VPN comes up after dockerd has made its decision about which dns to use. The moment the VPN comes up, the dns server in resolv.con changes, and dockerd is now using a stale value.

tianon commented 2 years ago

Huh, very interesting -- I think this error is actually coming from dockerd itself, which IIRC would respond to resolv.conf changes (the daemon "DNS" configuration isn't used for looking up registry domains, IIRC -- I believe that configuration is just a default for containers), so that makes me think that perhaps dockerd itself isn't getting the updated resolv.conf? Maybe you need to restart all of snapd as well? :grimacing:

neurer commented 2 years ago

I think my issue might be related. Recently moved to Ubuntu 22.04 dev release. Followed the familiar steps and have Docker Snap (docker 20.10.12 1690 latest/stable) up and running. ufw is enabled; no additional config.

docker-compose up -d --build comes up with this:

W: Failed to fetch http://deb.debian.org/debian/dists/bullseye/InRelease Temporary failure resolving 'deb.debian.org' W: Failed to fetch http://security.debian.org/debian-security/dists/bullseye-security/InRelease Temporary failure resolving 'security.debian.org' W: Failed to fetch http://deb.debian.org/debian/dists/bullseye-updates/InRelease Temporary failure resolving 'deb.debian.org' W: Some index files failed to download. They have been ignored, or old ones used instead.

ufw disable is required for it to build/finish. ufw enable followed by docker exec -it whatever bash and apt update then gives me this:

Err:1 http://deb.debian.org/debian bullseye InRelease
Could not connect to deb.debian.org:80 (151.101.14.132), connection timed out Err:2 http://security.debian.org/debian-security bullseye-security InRelease
Could not connect to security.debian.org:80 (151.101.66.132), connection timed out Could not connect to security.debian.org:80 (151.101.2.132), connection timed out Could not connect to security.debian.org:80 (151.101.130.132), connection timed out Could not connect to security.debian.org:80 (151.101.194.132), connection timed out Err:3 http://deb.debian.org/debian bullseye-updates InRelease
Could not connect to deb.debian.org:80 (151.101.242.132), connection timed out Reading package lists... Done
Building dependency tree... Done Reading state information... Done 18 packages can be upgraded. Run 'apt list --upgradable' to see them. W: Failed to fetch http://deb.debian.org/debian/dists/bullseye/InRelease Could not connect to deb.debian.org:80 (151.101.14.132), connection timed out W: Failed to fetch http://security.debian.org/debian-security/dists/bullseye-security/InRelease Could not connect to security.debian.org:80 (151.101.66.132), connection timed out Could not connect to security.debian.org:80 (151.101.2.132), connection timed out Could not connect to security.debian.org:80 (151.101.130.132), connection timed out Could not connect to security.debian.org:80 (151.101.194.132), connection timed out W: Failed to fetch http://deb.debian.org/debian/dists/bullseye-updates/InRelease Could not connect to deb.debian.org:80 (151.101.242.132), connection timed out W: Some index files failed to download. They have been ignored, or old ones used instead.

ufw disable and apt will work fine. This is new behavior. Any suggestions would be much appreciated.

shakeelansari63 commented 1 year ago

I guess there is some sort of race condition between NetworkManager and docker daemon which eventually breaks the docker daemon.

As a workaround, I have permanently disabled docker daemon auto start using sudo snap stop --disable docker.

Then restart the PC.

And whenever I need to use docker, I will simply start it using sudo snap start docker.

Now there is no conflict as docker does not auto start and I will start docker only when I need it.

For easy access, I have aliased the docker start and stop commands.

~/.bashrc

alias start-docker='sudo snap start docker'
alias stop-docker='sudo snap stop docker'