docker / for-linux

Docker Engine for Linux
https://docs.docker.com/engine/installation/
755 stars 85 forks source link

Daemon doesn't start when using userns-remap and a custom bridge #740

Open nevinsm opened 5 years ago

nevinsm commented 5 years ago

Expected behavior

I am able to start a daemon with the --bridge=custombridge flag set so that the daemon uses a custom bridge network when using the userns-remap, and group flags

Actual behavior

The daemon doesn't start and errors with: "failed to start daemon: Error initializing network controller: error obtaining controller instance"

Steps to reproduce the behavior

0) Have an Ubuntu 18.04 server install

1) Add a bridge via the following yaml file located at /etc/netplan/bridges.yaml, then running sudo netplan apply

network:
  version: 2
  renderer: networkd
  bridges:
    test:
      addresses: [ "10.10.0.1/16" ]
      interfaces: []

2) Create a user for the daemon sudo adduser test

3) Ensure the subuid and subgid mappings are in place (use these commands to add the lines if they don't exist already):

echo "test:1001:1" >> /etc/subuid;
echo "test:231072:65536" >> /etc/subuid;

echo "test:1001:1" >> /etc/subgid;
echo "test:231072:65536" >> /etc/subgid;

4) Create a directory for the daemon in the users home directory sudo mkdir /home/test/docker

5) Attempt to start the daemon with:

sudo /usr/bin/dockerd \
        --bridge=test \
        --data-root=/home/test/docker/docker.data \
        --exec-root=/home/test/docker/docker.exec \
        --host=unix:///home/test/docker/docker.sock \
        --pidfile=/home/test/docker/docker.pid \
        --userns-remap=test \
        --group=test

Output of docker version:

Docker version 19.03.1, build 74b1e89

Output of docker info:

Client:
 Debug Mode: false

Server:
 Containers: REDACTED
 Images: REDACTED
 Server Version: 19.03.1
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
 runc version: 425e105d5a03fabd737a126ad93d62a9eeede87f
 init version: fec3683
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 4.15.0-55-generic
 Operating System: Ubuntu 18.04.2 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 32
 Total Memory: 47.06GiB
 Name: REDACTED
 ID: REDACTED
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.) Physical server

nevinsm commented 5 years ago

Hopefully this will help to narrow it down, and I forgot to put it in the issue, but if I downgrade to 18.09.8 it all works fine. Thanks for all the great work, and for Docker in general.

thaJeztah commented 5 years ago

/cc @estesp @akihirosuda

estesp commented 5 years ago

I'm curious why userns-remap is affecting daemon startup, as the daemon itself has root privilege in this instance, and shouldn't be limited in interacting with bridges, etc. However, clearly there is some interplay, and sounds like it was related to a change since 18.09.8. Any chance someone from libnetwork team can look at any changes in that window for bridge configuration? I'm still puzzled how userns-remap is interacting with that part of daemon startup.

@nevinsm downgrading to 18.09.8 (the working case) is on this same server with the exact same flags to the engine?

arkodg commented 5 years ago

@nevinsm it would help if you ran dockerd with -D and shared the debug logs

nevinsm commented 5 years ago

@estesp yup, the downgrade to 18.09.8 was performed on the same server with the same flags passed to the engine.

@arkodg I can get some debug logs tomorrow most likely.

Sorry for the delay in response, I will see about tweaking my notification settings so I see updates from github sooner.

nevinsm commented 5 years ago

@eptesp @arkodg Sorry for the delay, it took awhile before I was able to pull that machine back out of production. Here are the debug logs. docker-debug-logs.txt

arkodg commented 5 years ago

@nevinsm you're most likely hitting https://github.com/moby/moby/issues/39608