docker-archive / classicswarm

Swarm Classic: a container clustering system. Not to be confused with Docker Swarm which is at https://github.com/docker/swarmkit
Apache License 2.0
5.76k stars 1.08k forks source link

integration test fails to run docker-in-docker in master #2646

Closed dongluochen closed 7 years ago

dongluochen commented 7 years ago

It's failing consistently.

not ok 1 container affinty
# (from function `retry' in file ./helpers.bash, line 80,
#  from function `wait_until_reachable' in file ./helpers.bash, line 85,
#  from function `start_docker' in file ./helpers.bash, line 222,
#  from function `start_docker_with_busybox' in file ./helpers.bash, line 181,
#  in test file ./affinities.bats, line 11)
#   `start_docker_with_busybox 2' failed
# latest: Pulling from library/busybox
# c366cffde3c9: Pulling fs layer
# 1911ea24d99d: Pulling fs layer
# 1911ea24d99d: Verifying Checksum
# 1911ea24d99d: Download complete
# c366cffde3c9: Verifying Checksum
# c366cffde3c9: Download complete
# c366cffde3c9: Pull complete
# 1911ea24d99d: Pull complete
# Digest: sha256:348432dd709c2cd6ca42e56c2a0d157f611c50c908e14c9bfc1e9cb0ed234871
# Status: Downloaded newer image for busybox:latest
# Command "docker -H 127.0.0.1:5270 info" failed 15 times. Output: Cannot connect to the Docker daemon at tcp://127.0.0.1:5270. Is the docker daemon running?
# Stopping e1c98924648cc90c437659070237154d15790c4df470d34adbd9ae4eea2e8e2f
# Stopping 4c8cf34b9b0aeb252eaf5d1d54e1ae6900b395cec9fb1d43262f62c5257f6902
dongluochen commented 7 years ago

It looks like some change in current docker master causing dind failure. In the test the dind containers exit quickly with error group docker not found in the logs.

@aluzzardi @jpetazzo @vieux Do you know if some new change in Docker master may introduce this change?

# CONTAINER ID                                                       IMAGE                            COMMAND                                                                                                                                                                                                                                                                                                                                                   CREATED             STATUS                              PORTS               NAMES
# ee839608996df606dbd9a3f00f8abe7ed199ce8ac571d5064404503b96d866fb   dockerswarm/dind-master:latest   "/dind sh -c '        rm -f /var/run/docker.pid ;         rm -f /var/run/docker/libcontainerd/docker-containerd.pid ;         rm -f /var/run/docker/libcontainerd/docker-containerd.sock ;         hostname node-1 ;         docker daemon -H 127.0.0.1:5562           -H=unix:///var/run/docker.sock           --storage-driver=aufs                '"   1 seconds ago       Exited (1) Less than a second ago                       node-1
# 069875fee52b611b5d891cc5efc6e62fb15b9a5be26ac0303921dfd276763d5e   dockerswarm/dind-master:latest   "/dind sh -c '        rm -f /var/run/docker.pid ;         rm -f /var/run/docker/libcontainerd/docker-containerd.pid ;         rm -f /var/run/docker/libcontainerd/docker-containerd.sock ;         hostname node-0 ;         docker daemon -H 127.0.0.1:5561           -H=unix:///var/run/docker.sock           --storage-driver=aufs                '"   2 seconds ago       Exited (1) Less than a second ago                       node-0

docker logs

# Command "daemon" is deprecated, and will be removed in Docker 1.16. Please run `dockerd` directly.
# WARN[0000] [!] DON'T BIND ON ANY IP ADDRESS WITHOUT setting --tlsverify IF YOU DON'T KNOW WHAT YOU'RE DOING [!]
# group docker not found

docker inspect

# [
# {
#     "Id": "069875fee52b611b5d891cc5efc6e62fb15b9a5be26ac0303921dfd276763d5e",
#     "Created": "2017-03-08T07:00:53.999634442Z",
#     "Path": "/dind",
#     "Args": [
#         "sh",
#         "-c",
#         "        rm -f /var/run/docker.pid ;         rm -f /var/run/docker/libcontainerd/docker-containerd.pid ;         rm -f /var/run/docker/libcontainerd/docker-containerd.sock ;         hostname node-0 ;         docker daemon -H 127.0.0.1:5561           -H=unix:///var/run/docker.sock           --storage-driver=aufs                "
#     ],
#     "State": {
#         "Status": "exited",
#         "Running": false,
#         "Paused": false,
#         "Restarting": false,
#         "OOMKilled": false,
#         "Dead": false,
#         "Pid": 0,
#         "ExitCode": 1,
#         "Error": "",
#         "StartedAt": "2017-03-08T07:00:54.317124655Z",
#         "FinishedAt": "2017-03-08T07:00:54.424365401Z"
#     },
#     "Image": "6f7cf2c342b4abc6a8baa9c3500519954799f5d1d1851ce695105844ca721c39",
#     "ResolvConfPath": "/var/lib/docker/containers/069875fee52b611b5d891cc5efc6e62fb15b9a5be26ac0303921dfd276763d5e/resolv.conf",
#     "HostnamePath": "/var/lib/docker/containers/069875fee52b611b5d891cc5efc6e62fb15b9a5be26ac0303921dfd276763d5e/hostname",
#     "HostsPath": "/var/lib/docker/containers/069875fee52b611b5d891cc5efc6e62fb15b9a5be26ac0303921dfd276763d5e/hosts",
#     "LogPath": "/var/lib/docker/containers/069875fee52b611b5d891cc5efc6e62fb15b9a5be26ac0303921dfd276763d5e/069875fee52b611b5d891cc5efc6e62fb15b9a5be26ac0303921dfd276763d5e-json.log",
#     "Name": "/node-0",
#     "RestartCount": 0,
#     "Driver": "aufs",
#     "ExecDriver": "native-0.2",
#     "MountLabel": "",
#     "ProcessLabel": "",
#     "AppArmorProfile": "",
#     "ExecIDs": null,
#     "HostConfig": {
#         "Binds": null,
#         "ContainerIDFile": "",
#         "LxcConf": [],
#         "Memory": 0,
#         "MemoryReservation": 0,
#         "MemorySwap": 0,
#         "KernelMemory": 0,
#         "CpuShares": 0,
#         "CpuPeriod": 0,
#         "CpusetCpus": "",
#         "CpusetMems": "",
#         "CpuQuota": 0,
#         "BlkioWeight": 0,
#         "OomKillDisable": false,
#         "MemorySwappiness": -1,
#         "Privileged": true,
#         "PortBindings": {},
#         "Links": null,
#         "PublishAllPorts": false,
#         "Dns": null,
#         "DnsOptions": null,
#         "DnsSearch": null,
#         "ExtraHosts": null,
#         "VolumesFrom": null,
#         "Devices": [],
#         "NetworkMode": "host",
#         "IpcMode": "",
#         "PidMode": "",
#         "UTSMode": "",
#         "CapAdd": null,
#         "CapDrop": null,
#         "GroupAdd": null,
#         "RestartPolicy": {
#             "Name": "no",
#             "MaximumRetryCount": 0
#         },
#         "SecurityOpt": null,
#         "ReadonlyRootfs": false,
#         "Ulimits": null,
#         "LogConfig": {
#             "Type": "json-file",
#             "Config": {}
#         },
#         "CgroupParent": "",
#         "ConsoleSize": [
#             0,
#             0
#         ],
#         "VolumeDriver": ""
#     },
#     "GraphDriver": {
#         "Name": "aufs",
#         "Data": null
#     },
#     "Mounts": [
#         {
#             "Name": "49cb2018979c364e0efabd7dd38bb07295ae6c78725da31558fd7bebdffec102",
#             "Source": "/var/lib/docker/volumes/49cb2018979c364e0efabd7dd38bb07295ae6c78725da31558fd7bebdffec102/_data",
#             "Destination": "/usr/local/bin",
#             "Driver": "local",
#             "Mode": "",
#             "RW": true
#         },
#         {
#             "Name": "0c4768db58410501fd1dac8105bd1a0abbb2cd7cac6eae5f46b5c6d4f88bfe59",
#             "Source": "/var/lib/docker/volumes/0c4768db58410501fd1dac8105bd1a0abbb2cd7cac6eae5f46b5c6d4f88bfe59/_data",
#             "Destination": "/var/run",
#             "Driver": "local",
#             "Mode": "",
#             "RW": true
#         },
#         {
#             "Name": "6162d94e2cebf73c35d71ee2e8ba15531ff8383ef7dfdf0bd1409a3aea918805",
#             "Source": "/var/lib/docker/volumes/6162d94e2cebf73c35d71ee2e8ba15531ff8383ef7dfdf0bd1409a3aea918805/_data",
#             "Destination": "/var/lib/docker",
#             "Driver": "local",
#             "Mode": "",
#             "RW": true
#         }
#     ],
#     "Config": {
#         "Hostname": "0144b21d190a",
#         "Domainname": "",
#         "User": "",
#         "AttachStdin": false,
#         "AttachStdout": false,
#         "AttachStderr": false,
#         "Tty": true,
#         "OpenStdin": true,
#         "StdinOnce": false,
#         "Env": [
#             "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
#             "VERSION=dev"
#         ],
#         "Cmd": [
#             "sh",
#             "-c",
#             "        rm -f /var/run/docker.pid ;         rm -f /var/run/docker/libcontainerd/docker-containerd.pid ;         rm -f /var/run/docker/libcontainerd/docker-containerd.sock ;         hostname node-0 ;         docker daemon -H 127.0.0.1:5561           -H=unix:///var/run/docker.sock           --storage-driver=aufs                "
#         ],
#         "Image": "dockerswarm/dind-master:latest",
#         "Volumes": {
#             "/usr/local/bin": {},
#             "/var/lib/docker": {},
#             "/var/run": {}
#         },
#         "WorkingDir": "",
#         "Entrypoint": [
#             "/dind"
#         ],
#         "OnBuild": null,
#         "Labels": {},
#         "StopSignal": "SIGTERM"
#     },
#     "NetworkSettings": {
#         "Bridge": "",
#         "SandboxID": "",
#         "HairpinMode": false,
#         "LinkLocalIPv6Address": "",
#         "LinkLocalIPv6PrefixLen": 0,
#         "Ports": null,
#         "SandboxKey": "",
#         "SecondaryIPAddresses": null,
#         "SecondaryIPv6Addresses": null,
#         "EndpointID": "",
#         "Gateway": "",
#         "GlobalIPv6Address": "",
#         "GlobalIPv6PrefixLen": 0,
#         "IPAddress": "",
#         "IPPrefixLen": 0,
#         "IPv6Gateway": "",
#         "MacAddress": "",
#         "Networks": {
#             "host": {
#                 "EndpointID": "",
#                 "Gateway": "",
#                 "IPAddress": "",
#                 "IPPrefixLen": 0,
#                 "IPv6Gateway": "",
#                 "GlobalIPv6Address": "",
#                 "GlobalIPv6PrefixLen": 0,
#                 "MacAddress": ""
#             }
#         }
#     }
# }
# ]
jpetazzo commented 7 years ago

Completely random suggestion: is there a docker group in /etc/group in that container?

dongluochen commented 7 years ago

Thanks @jpetazzo! I can manually start this dind container and it'd be running. There is no docker group in /etc/group. But it doesn't exit or has error group docker not found in the logs. Any idea where I should look?

dchen@vm2:~$ docker ps --no-trunc
CONTAINER ID                                                       IMAGE                                                                                    COMMAND                                                                                                                                                                                                                                                                                                                                           CREATED             STATUS              PORTS               NAMES
d9d3bf5cae894f6b350910b660f72d6571931e8f0cb93d722f5d38c8754f4b6f   dockerswarm/dind-master:latest                                                           "/dind sh -c '       rm -f /var/run/docker.pid ;         rm -f /var/run/docker/libcontainerd/docker-containerd.pid ;         rm -f /var/run/docker/libcontainerd/docker-containerd.sock ;         hostname node-1 ;         dockerd -H 127.0.0.1:5325           -H=unix:///var/run/docker.sock           --storage-driver=aufs               '"   2 minutes ago       Up 2 minutes                            musing_montalcini
dchen@vm2:~$ docker exec -ti d9d3bf5cae89 sh
# ls /etc/group
/etc/group
# grep -i docker /etc/group
#
chen@vm2:~$ docker logs d9d3bf5cae89
time="2017-03-11T00:22:35.631765475Z" level=warning msg="[!] DON'T BIND ON ANY IP ADDRESS WITHOUT setting --tlsverify IF YOU DON'T KNOW WHAT YOU'RE DOING [!]"
time="2017-03-11T00:22:35.633464288Z" level=info msg="libcontainerd: new containerd process, pid: 21"
time="2017-03-11T00:22:36.669281251Z" level=info msg="Graph migration to content-addressability took 0.00 seconds"
time="2017-03-11T00:22:36.669623875Z" level=warning msg="Your kernel does not support swap memory limit"
time="2017-03-11T00:22:36.669747394Z" level=warning msg="Your kernel does not support cgroup rt period"
time="2017-03-11T00:22:36.669786932Z" level=warning msg="Your kernel does not support cgroup rt runtime"
time="2017-03-11T00:22:36.669967739Z" level=warning msg="mountpoint for pids not found"
time="2017-03-11T00:22:36.670426717Z" level=info msg="Loading containers: start."
time="2017-03-11T00:22:36.670698293Z" level=warning msg="Running modprobe nf_nat failed with message: ``, error: exec: \"modprobe\": executable file not found in $PATH"
time="2017-03-11T00:22:36.670761902Z" level=warning msg="Running modprobe xt_conntrack failed with message: ``, error: exec: \"modprobe\": executable file not found in $PATH"
time="2017-03-11T00:22:36.707938436Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
time="2017-03-11T00:22:36.731462363Z" level=info msg="Loading containers: done."
time="2017-03-11T00:22:36.732409750Z" level=warning msg="Couldn't run auplink before unmount /var/lib/docker/tmp/docker-aufs-union446366961: exec: \"auplink\": executable file not found in $PATH"
time="2017-03-11T00:22:36.752170270Z" level=warning msg="failed to retrieve docker-init version"
time="2017-03-11T00:22:36.752598259Z" level=info msg="Daemon has completed initialization"
time="2017-03-11T00:22:36.752659542Z" level=info msg="Docker daemon" commit=f0a13eb graphdriver=aufs version=1.14.0-dev
time="2017-03-11T00:22:36.765948993Z" level=info msg="API listen on /var/run/docker.sock"
time="2017-03-11T00:22:36.766096469Z" level=info msg="API listen on 127.0.0.1:5325"
dongluochen commented 7 years ago

It works if I add docker to group. Thanks @tonistiigi. Looking at where this was introduced.

iff --git a/test/integration/helpers.bash b/test/integration/helpers.bash
index b41baba..b3c0c63 100644
--- a/test/integration/helpers.bash
+++ b/test/integration/helpers.bash
@@ -211,15 +211,17 @@ function start_docker() {
         rm -f /var/run/docker.pid ; \
         rm -f /var/run/docker/libcontainerd/docker-containerd.pid ; \
         rm -f /var/run/docker/libcontainerd/docker-containerd.sock ; \
+                               addgroup docker ; \
         hostname node-$i ; \
dongluochen commented 7 years ago

https://github.com/docker/docker/pull/30729 enforces a requirement for docker group to exist in /etc/group but dind images do not have this group, which leads to the failure. cc @dmcgowan.

I think the right fix is to add docker to dind images.

tonistiigi commented 7 years ago

@dmcgowan Can we remove the requirement for the group to exist to start dockerd and only use it if it is already created.

dongluochen commented 7 years ago

docker/docker#31833 fixed this issue.