docker / for-linux

Docker Engine for Linux
https://docs.docker.com/engine/installation/
756 stars 85 forks source link

Embedded DNS won't resolve some names on some hosts in same overlay network #944

Open meermanr opened 4 years ago

meermanr commented 4 years ago

Expected behavior

Services created by docker service create should be able to resolve the names of all other services within the same overlay network.

Actual behavior

Some nodes in my swarm cluster are unable to resolve some service names in a given overlay network.

For example, using pssh to run a quick experiment on all the nods in my swarm:

$ pssh -i -h hosts.txt -p10 'docker run -i --rm --network dhw54uswu0kb alpine sh -c "apk --quiet add bind-tools && dig +noall +answer tasks.network_manager"'
[1] 21:31:50 [SUCCESS] nc-b9-1-1
tasks.network_manager.  600     IN      A       172.31.8.159
[2] 21:31:50 [SUCCESS] nc-b9-1-2
tasks.network_manager.  600     IN      A       172.31.8.159
[3] 21:31:54 [SUCCESS] nc-b9-2-3
tasks.network_manager.  600     IN      A       172.31.8.159
[4] 21:31:54 [SUCCESS] nc-b9-10-2
[5] 21:31:54 [SUCCESS] nc-b9-1-3
tasks.network_manager.  600     IN      A       172.31.8.159
[6] 21:31:54 [SUCCESS] nc-b9-10-1
tasks.network_manager.  600     IN      A       172.31.8.159
[7] 21:31:55 [SUCCESS] nc-b9-10-4
tasks.network_manager.  600     IN      A       172.31.8.159
[8] 21:31:56 [SUCCESS] nc-b9-2-4
tasks.network_manager.  600     IN      A       172.31.8.159
[9] 21:31:56 [SUCCESS] nc-b9-10-3
[10] 21:31:56 [SUCCESS] nc-b9-2-1
tasks.network_manager.  600     IN      A       172.31.8.159
[11] 21:31:57 [SUCCESS] nc-b9-3-1
tasks.network_manager.  600     IN      A       172.31.8.159
[12] 21:31:58 [SUCCESS] nc-b9-3-2
tasks.network_manager.  600     IN      A       172.31.8.159
[13] 21:31:58 [SUCCESS] nc-b9-3-4
tasks.network_manager.  600     IN      A       172.31.8.159
[14] 21:31:58 [SUCCESS] nc-b9-3-3
tasks.network_manager.  600     IN      A       172.31.8.159
[15] 21:31:59 [SUCCESS] nc-b9-2-2
tasks.network_manager.  600     IN      A       172.31.8.159
[16] 21:32:00 [SUCCESS] nc-b9-4-2
tasks.network_manager.  600     IN      A       172.31.8.159
[17] 21:32:00 [SUCCESS] nc-b9-4-4
tasks.network_manager.  600     IN      A       172.31.8.159
[18] 21:32:00 [SUCCESS] nc-b9-4-3
tasks.network_manager.  600     IN      A       172.31.8.159
[19] 21:32:01 [SUCCESS] nc-b9-4-1
tasks.network_manager.  600     IN      A       172.31.8.159
[20] 21:32:01 [SUCCESS] nc-b9-5-1
tasks.network_manager.  600     IN      A       172.31.8.159
[21] 21:32:01 [SUCCESS] nc-b9-5-3
tasks.network_manager.  600     IN      A       172.31.8.159
[22] 21:32:02 [SUCCESS] nc-b9-6-1
tasks.network_manager.  600     IN      A       172.31.8.159
[23] 21:32:02 [SUCCESS] nc-b9-5-4
tasks.network_manager.  600     IN      A       172.31.8.159
[24] 21:32:03 [SUCCESS] nc-b9-1-4
tasks.network_manager.  600     IN      A       172.31.8.159
[25] 21:32:03 [SUCCESS] nc-b9-6-2
tasks.network_manager.  600     IN      A       172.31.8.159
[26] 21:32:03 [SUCCESS] nc-b9-6-3
tasks.network_manager.  600     IN      A       172.31.8.159
[27] 21:32:04 [SUCCESS] nc-b9-6-4
tasks.network_manager.  600     IN      A       172.31.8.159
[28] 21:32:04 [SUCCESS] nc-b9-5-2
tasks.network_manager.  600     IN      A       172.31.8.159
[29] 21:32:04 [SUCCESS] nc-b9-7-1
tasks.network_manager.  600     IN      A       172.31.8.159
[30] 21:32:04 [SUCCESS] nc-b9-7-2
tasks.network_manager.  600     IN      A       172.31.8.159
[31] 21:32:05 [SUCCESS] nc-b9-7-3
tasks.network_manager.  600     IN      A       172.31.8.159
[32] 21:32:05 [SUCCESS] nc-b9-7-4
tasks.network_manager.  600     IN      A       172.31.8.159
[33] 21:32:05 [SUCCESS] nc-b9-8-1
tasks.network_manager.  600     IN      A       172.31.8.159
[34] 21:32:07 [SUCCESS] nc-b9-8-3
tasks.network_manager.  600     IN      A       172.31.8.159
[35] 21:32:07 [SUCCESS] nc-b9-8-4
tasks.network_manager.  600     IN      A       172.31.8.159
[36] 21:32:07 [SUCCESS] nc-b9-9-1
tasks.network_manager.  600     IN      A       172.31.8.159
[37] 21:32:07 [SUCCESS] nc-b9-9-2
tasks.network_manager.  600     IN      A       172.31.8.159
[38] 21:32:07 [SUCCESS] nc-b9-9-3
tasks.network_manager.  600     IN      A       172.31.8.159

Note that two of the nodes did not return an IP address (but dig still exited cleanly):

To get things working again, I've resorted to draining the node, deleting the network, and then making the node available again. So this feels like a missed update / state synchronisation issue to me.

Steps to reproduce the behavior

Not sure. Seems to happen after doing repeated docker stack deploy against the same stack over and over, but using different YAML files (so only touching a subset of the stack at a time). It only seems to happen when the cluster is under high load and generally unresponsive (high CPU usage on multiple hosts, as we've not tuned resource limits yet).

I suspect (but cannot yet prove) that docker service rm followed by docker stack deploy recreating the service before the containers have exited may be causing this. We've set the stop_grace_period to multiple days (simulation workloads which are expensive to restart), and I've anecdotally noticed that swarm loses its tasks governing containers that have been signalled but not yet exited.

Output of docker version:

(Same on all hosts)

Client: Docker Engine - Community
 Version:           19.03.4
 API version:       1.40
 Go version:        go1.12.10
 Git commit:        9013bf583a
 Built:             Fri Oct 18 15:54:09 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.4
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.10
  Git commit:       9013bf583a
  Built:            Fri Oct 18 15:52:40 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.10
  GitCommit:        b34a5c8af56e510852c35414db4c1f4fa6172339
 runc:
  Version:          1.0.0-rc8+dev
  GitCommit:        3e425f80a8c931f88e6d94a8c831b9d5aa481657
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Output of docker info:

Client:
 Debug Mode: false

Server:
 Containers: 19
  Running: 19
  Paused: 0
  Stopped: 0
 Images: 9
 Server Version: 19.03.4
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: active
  NodeID: 64lbc3ok1tqr8j19s2f896rwf
  Is Manager: true
  ClusterID: l0zp5dedokl4v95b5pbr34ax2
  Managers: 3
  Nodes: 38
  Default Address Pool: 172.16.0.0/12
  SubnetSize: 24
  Data Path Port: 4789
  Orchestration:
   Task History Retention Limit: 5
  Raft:
   Snapshot Interval: 10000
   Number of Old Snapshots to Retain: 0
   Heartbeat Tick: 1
   Election Tick: 10
  Dispatcher:
   Heartbeat Period: 5 seconds
  CA Configuration:
   Expiry Duration: 3 months
   Force Rotate: 0
  Autolock Managers: false
  Root Rotation In Progress: false
  Node Address: 10.58.203.60
  Manager Addresses:
   10.58.203.55:2377
   10.58.203.60:2377
   10.58.203.89:2377
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: b34a5c8af56e510852c35414db4c1f4fa6172339
 runc version: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
 init version: fec3683
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 4.15.0-66-generic
 Operating System: Ubuntu 18.04.3 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 40
 Total Memory: 754.4GiB
 Name: nc-b9-4-1
 ID: 4KOX:EMBW:TZFK:7VWO:VUYE:OBLX:46JW:QWOF:V3IL:RT5H:GNZ4:43OB
 Docker Root Dir: /opt/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support

Additional environment details (AWS, VirtualBox, physical, etc.)

Bare-metal installation of Ubuntu 18.04 with only Docker CE and some utilities (tmux, vim, etc). The hosts have 2x20 CPU (no hyperthreading) and 768 GIB RAM, so when the cluster is under load there are a lot of processes on a given node competing for attention. I suspect I may need to tune buffer sizes somewhere.

Every node in the swarm has a bonded network interface made up of 4x NICs as below, so I'm not sure exactly how overlay traffic passes between hosts - there may be more than one path.

# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: REDACTED
Active Aggregator Info:
        Aggregator ID: 4
        Number of ports: 4
        Actor Key: 15
        Partner Key: 41
        Partner Mac Address: REDACTED

Slave Interface: enp94s0f1
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 8
Permanent HW addr: REDACTED
Slave queue ID: 0
Aggregator ID: 4
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 3
Partner Churned Count: 3
details actor lacp pdu:
    system priority: 65535
    system mac address: REDACTED
    port key: 15
    port priority: 255
    port number: 1
    port state: 61
details partner lacp pdu:
    system priority: 32768
    system mac address: REDACTED
    oper key: 41
    port priority: 32768
    port number: 57
    port state: 61

Slave Interface: enp94s0f0
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: REDACTED
Slave queue ID: 0
Aggregator ID: 4
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
    system priority: 65535
    system mac address: REDACTED
    port key: 15
    port priority: 255
    port number: 2
    port state: 61
details partner lacp pdu:
    system priority: 32768
    system mac address: REDACTED
    oper key: 41
    port priority: 32768
    port number: 32785
    port state: 61

Slave Interface: enp24s0f1
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: REDACTED
Slave queue ID: 0
Aggregator ID: 4
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
    system priority: 65535
    system mac address: REDACTED
    port key: 15
    port priority: 255
    port number: 3
    port state: 61
details partner lacp pdu:
    system priority: 32768
    system mac address: REDACTED
    oper key: 41
    port priority: 32768
    port number: 32825
    port state: 61

Slave Interface: enp24s0f0
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 5
Permanent HW addr: REDACTED
Slave queue ID: 0
Aggregator ID: 4
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 3
Partner Churned Count: 3
details actor lacp pdu:
    system priority: 65535
    system mac address: REDACTED
    port key: 15
    port priority: 255
    port number: 4
    port state: 61
details partner lacp pdu:
    system priority: 32768
    system mac address: REDACTED
    oper key: 41
    port priority: 32768
    port number: 17
    port state: 61

Reporter's thoughts

As much as anything, I'm looking to learn how to debug this. I've crawled the documentation and researched this as best I can. I'm fairly confident this is going to repeat for me.

meermanr commented 4 years ago

Correction: Draining the node deletes the network itself, but reactivating the node brings the erroneous behaviour right back!

Draining the node, restarting dockerd, and then reactivating didn't seem to change things, for the swarm-launched services, but when I attempt to run my test case again I got a new error:

 docker run -it --rm --network mpdti_default alpine sh -c 'apk --quiet add bind-tools && dig +noall +answer tasks.network_manager'
Unable to find image 'alpine:latest' locally
latest: Pulling from library/alpine
c9b1b535fdd9: Pull complete
Digest: sha256:ab00606a42621fb68f2ed6ad3c88be54397f981a7b70a79db3d1172b11c4367d
Status: Downloaded newer image for alpine:latest
docker: Error response from daemon: failed to get network during CreateEndpoint: network dhw54uswu0kb28hc6crasj0xu not found.

But second attempt worked:

 docker run -it --rm --network mpdti_default alpine sh -c 'apk --quiet add bind-tools && dig +noall +answer tasks.network_manager'
tasks.network_manager.  600     IN      A       172.31.8.159
meermanr commented 4 years ago

Network description, in case it helps:

# docker network inspect dhw54uswu0kb28hc6crasj0xu
[
    {
        "Name": "mpdti_default",
        "Id": "dhw54uswu0kb28hc6crasj0xu",
        "Created": "2020-02-26T22:57:33.354757034Z",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "172.31.0.0/16",
                    "Gateway": "172.31.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": true,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "71a921da61e5d318a3646d3069cdf4bdc7f5a4a1d22536b83882acb485872cef": {
                "Name": "angry_austin",
                "EndpointID": "b4b6701ed28f6cc25cbe8925d12d06c15de25079c67b9b78faea68305975855e",
                "MacAddress": "02:42:ac:1f:28:e9",
                "IPv4Address": "172.31.40.233/16",
                "IPv6Address": ""
            },
            "9e34cb5543869c0f843ce21f6714c5af9fafc969b898b1cadfce985b2e2c3c0b": {
                "Name": "mpdti_worker_build_pool_android.66l12d1hqd2ol1i5zi7nbil09.3lu9csnhd5rnu0xvp3f6ctjye",
                "EndpointID": "38bb9fb13d3f9f9ed1ae511b1879c759faaa0a463d313b9b222245f5950d2c44",
                "MacAddress": "02:42:ac:1f:28:e5",
                "IPv4Address": "172.31.40.229/16",
                "IPv6Address": ""
            },
            "lb-mpdti_default": {
                "Name": "mpdti_default-endpoint",
                "EndpointID": "ea1e94211ff1063138e16b1877e5d5f346134b153f45fdea07565f8cf4bb1895",
                "MacAddress": "02:42:ac:1f:28:d5",
                "IPv4Address": "172.31.40.213/16",
                "IPv6Address": ""
            }
        },
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "4114"
        },
        "Labels": {
            "com.docker.stack.namespace": "mpdti"
        },
        "Peers": [
            {
                "Name": "f151939a89e3",
                "IP": "10.58.203.92"
            },
            {
                "Name": "c62197e7f8a9",
                "IP": "10.58.203.47"
            },
            {
                "Name": "218ff8ebeea6",
                "IP": "10.58.203.55"
            },
            {
                "Name": "9aaf335debf1",
                "IP": "10.58.203.61"
            },
            {
                "Name": "6025bd2955aa",
                "IP": "10.58.203.73"
            },
            {
                "Name": "4d5c23a8859c",
                "IP": "10.58.203.74"
            },
            {
                "Name": "db2091586806",
                "IP": "10.58.203.52"
            },
            {
                "Name": "caf06a766c03",
                "IP": "10.58.203.58"
            },
            {
                "Name": "554ce305eec0",
                "IP": "10.58.203.64"
            },
            {
                "Name": "f289986309b0",
                "IP": "10.58.203.54"
            },
            {
                "Name": "55f67db8b55f",
                "IP": "10.58.203.67"
            },
            {
                "Name": "e6604b988202",
                "IP": "10.58.203.62"
            },
            {
                "Name": "f7a93b694822",
                "IP": "10.58.203.63"
            },
            {
                "Name": "282e8ce82a81",
                "IP": "10.58.203.77"
            },
            {
                "Name": "1e4cb1e5edbc",
                "IP": "10.58.203.91"
            },
            {
                "Name": "6073b6866f73",
                "IP": "10.58.203.39"
            },
            {
                "Name": "d87e894edd21",
                "IP": "10.58.203.69"
            },
            {
                "Name": "0882ec36e09d",
                "IP": "10.58.203.75"
            },
            {
                "Name": "4d5a1ca24529",
                "IP": "10.58.203.53"
            },
            {
                "Name": "ac6107fc6447",
                "IP": "10.58.203.66"
            },
            {
                "Name": "cec4c94db5ce",
                "IP": "10.58.203.56"
            },
            {
                "Name": "78c5f90440a1",
                "IP": "10.58.203.83"
            },
            {
                "Name": "89d3a7eded8a",
                "IP": "10.58.203.60"
            },
            {
                "Name": "d433c8e3bb92",
                "IP": "10.58.203.57"
            },
            {
                "Name": "6004340b87c8",
                "IP": "10.58.203.40"
            },
            {
                "Name": "8146fd246f70",
                "IP": "10.58.203.71"
            },
            {
                "Name": "0fa5e6eae40a",
                "IP": "10.58.203.94"
            },
            {
                "Name": "3e13891b8af5",
                "IP": "10.58.203.86"
            },
            {
                "Name": "df19351d2cc2",
                "IP": "10.58.203.41"
            },
            {
                "Name": "b21a92b35a37",
                "IP": "10.58.203.81"
            },
            {
                "Name": "2e75b279011b",
                "IP": "10.58.203.78"
            },
            {
                "Name": "fbd0e868175d",
                "IP": "10.58.203.50"
            },
            {
                "Name": "c01ac1cef12b",
                "IP": "10.58.203.87"
            },
            {
                "Name": "d021d6194886",
                "IP": "10.58.203.89"
            },
            {
                "Name": "b0ce820e7342",
                "IP": "10.58.203.72"
            },
            {
                "Name": "4b63c3ecb279",
                "IP": "10.58.203.76"
            },
            {
                "Name": "e29eaf6d3bb6",
                "IP": "10.58.203.79"
            },
            {
                "Name": "aa0f360100c9",
                "IP": "10.58.203.90"
            }
        ]
    }
]