docker / for-linux

Docker Engine for Linux
https://docs.docker.com/engine/installation/
747 stars 85 forks source link

Container Attachment To Custom Bridge Network Causes Host Network Interruption (IPv6?) #914

Open Alex-Richman opened 4 years ago

Alex-Richman commented 4 years ago

We've been seeing an issue with host network interruptions when starting/stopping our development environment, which uses Docker heavily. This manifests as ERR_NETWORK_CHANGED in Chrome and WiFi connections flapping down/back up.

After some debugging I think I've narrowed it down to:

Potentially relevant journal entries related to IPv6 ADDRCONF on the new veth devices. These log entries are the only ones missing when IPv6 is disabled (and the issue is not present):

Jan 28 01:01:45 MinyArch kernel: IPv6: ADDRCONF(NETDEV_UP): veth7698f4b: link is not ready
Jan 28 01:01:45 MinyArch kernel: IPv6: ADDRCONF(NETDEV_CHANGE): veth7698f4b: link becomes ready

Prior art (some only tangentially related):

We've seen this on Ubuntu LTS (18.04) and Debian Stretch (9.x)

I can't find any conclusions in the prior art (or in any docker issues) as to what is actually causing this, just "disable IPv6 / stop docker lol". Thought it might be useful to raise here to see if there were any further thoughts.

Output of docker version:

Client:
 Version:           18.09.5
 API version:       1.39
 Go version:        go1.10.8
 Git commit:        e8ff056dbc
 Built:             Thu Apr 11 04:44:28 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.1
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.5
  Git commit:       74b1e89
  Built:            Thu Jul 25 21:20:35 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.5
  GitCommit:        bb71b10fd8f58240ca47fbb579b9d1028eea7c84
 runc:
  Version:          1.0.0-rc6+dev
  GitCommit:        2b18fe1d885ee5083ef9f0838fee39b62d653e30
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Output of docker info:

Containers: 14
 Running: 0
 Paused: 0
 Stopped: 14
Images: 1536
Server Version: 19.03.1
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: bb71b10fd8f58240ca47fbb579b9d1028eea7c84
runc version: 2b18fe1d885ee5083ef9f0838fee39b62d653e30
init version: fec3683
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.9.0-7-amd64
Operating System: Debian GNU/Linux 9 (stretch)
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 15.4GiB
Name: MinyArch
ID: XU54:FOVV:H2YM:JJSY:TVQ5:JGS2:67BG:TNDF:545U:WCTM:VPVU:T4OR
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support
sveyret commented 4 years ago

I think I've got the same problem, but even without starting a container: as soon as I add ipv6 to docker daemon.json and the docker service is started, my host becomes hardly reachable using IPv6 (I tested SSH and HTTPS ports).

gardner commented 4 years ago

This happens for me as well on Ubuntu 20.04. It may be related to a container crashing and constantly restarting.

It manifests in Chrome as connection flapping:

image

EDIT: after receiving a number of downvotes I removed the full log. Check the comment history if you would like to access it.

VimS commented 3 years ago

Running Ubuntu 20.04, Docker version 19.03.14, build 5eb3275d40 and Google Chrome 87.0.4280.88. Same issue.

I have some Containers running via docker-compose with "restart: unless-stopped", when the restart because the process is finished Chrome throws ERR_NETWORK_CHANGED when the Interfae is tentative.

jixbo commented 3 years ago

Running Ubuntu 21.04 and Ubuntu 20.10, with docker 20.10.6, I'm having the same issue. Connection fails randomly for the containers, it also impacts web browsing, I'm getting error ERR_NETWORK_CHANGED with chrome quite often. By looking at the logs, it seems like there's a conflict with NetworkManager trying to handle docker interfaces. I tried to set docker interfaces to unmanaged using /etc/NetworkManager/NetworkManager.conf, but it didn't change anything.

Disabling ipv6 for the whole system fixes the issue, but it can't be disabled anymore in the latest version due https://github.com/moby/moby/issues/42288.

I don't have any custom bridge, and simply running docker produces the error. Logs look pretty much identical with ipv6 enabled and disabled, but it seems like with ipv6 disabled it reaches a stable state in a few seconds, whereas it does not with ipv6 enabled and the issues do not stop showing up in the logs.

My current workaround is to use the older 20.10.5 version, and disable ipv6 in the kernel.

mzhirnov1 commented 3 years ago

We use a lot of docker containers with ipv6 and Chrome inside. disable ipv6 in the kernel is not a solution for us. Are any other workarounds?

steersbob commented 3 years ago

@mzhirnov1 Setting the "fixed-cidr-v6" flag in daemon.json as described here worked for us.

mzhirnov1 commented 3 years ago

@mzhirnov1 Setting the "fixed-cidr-v6" flag in daemon.json as described here worked for us.

Do you have problems like: [1372074.839350] IPv6: ADDRCONF(NETDEV_CHANGE): vethecee105: link becomes ready

![Uploading image.png…]()

steersbob commented 3 years ago

Symptoms are as described by OP:

Effective workarounds are to either disable IPv6 on the host through sysctl, or the above described changes to daemon.json.

We haven't done any in-depth digging through docker / IPv6 network handling to find the root cause. We kind of threw workarounds at the wall until one stuck, and went back to the features and bugs in our own software.

StarpTech commented 2 years ago

Hi, any status update? We experience the exact same situation (PopOS 21.10, docker: 20.10.12) as described by @steersbob. This issue needs a lot more attention.

Client: Docker Engine - Community
 Version:           20.10.12
 API version:       1.41
 Go version:        go1.16.12
 Git commit:        e91ed57
 Built:             Mon Dec 13 11:45:33 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.12
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.12
  Git commit:       459d0df
  Built:            Mon Dec 13 11:43:41 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.12
  GitCommit:        7b11cfaabd73bb80907dd23182b9347b4245eb5d
 runc:
  Version:          1.0.2
  GitCommit:        v1.0.2-0-g52b36a2
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
DanielJoyce commented 1 year ago

Been bothering me for years now. Frustrating. Any workarounds?

universam1 commented 1 year ago

Been bothering me for years now. Frustrating. Any workarounds?

I switched to Podman for this and other reasons

StarpTech commented 1 year ago

Modyfing the didn't help me. But disabling ipv6 for my wlan networks resolve the issue. If I set this on the kernel a lot of stuff is crashing.

#!/bin/bash

sudo sysctl -w net.ipv6.conf.wlp8s0.disable_ipv6=1

sudo sysctl -p

wlp8s0 is my wireless network. Check one by one ifconfig -a which network causing this.