docker / for-linux

Docker Engine for Linux
https://docs.docker.com/engine/installation/
756 stars 85 forks source link

Container unable to connect to FQDN hostname of Docker Host from service container. #1289

Open mottati opened 3 years ago

mottati commented 3 years ago

TCP connection attempts from a Swarm hosted container are unable to make connections to the hostname FQDN of the docker host. Connections from the container to the same host via the IP address or a CNAME do connect.

Expected behavior

Connection via the FQDN hostname of the Docker Host should connect

Actual behavior

Connection to hostname FQDN is refused.

Steps to reproduce the behavior

Environment Info

From the docker host, exec into a container and from that container make an ssh connection back to the docker host. Attempt this connection in three different ways. First using the IP address of the Docker host, next using the CNAME that refers to the Docker host, last using the FQDN of the docker host.

The first two connection attempts work, the third fails.

Connect with IP address

[michael.ottati@michael.skylab.eng.pdx.wd ~]$ docker exec -it $(docker ps -q -f name=teamcity-agent.1)  ssh michael.ottati@10.96.32.175 -o ConnectTimeout=1 -o ConnectionAttempts=1  hostname
s-mxq80106bp.sys.az1.eng.pdx.wd

Connect with CNAME

[michael.ottati@michael.skylab.eng.pdx.wd ~]$ docker exec -it $(docker ps -q -f name=teamcity-agent.1)  ssh michael.ottati@michael.skylab.eng.pdx.wd -o ConnectTimeout=1 -o ConnectionAttempts=1  hostname
s-mxq80106bp.sys.az1.eng.pdx.wd

Connect with Docker Host FQDN

[michael.ottati@michael.skylab.eng.pdx.wd ~]$ docker exec -it $(docker ps -q -f name=teamcity-agent.1)  ssh michael.ottati@s-mxq80106bp.sys.az1.eng.pdx.wd -o ConnectTimeout=1 -o ConnectionAttempts=1  hostname
ssh: connect to host s-mxq80106bp.sys.az1.eng.pdx.wd port 22: Connection refused

The only connection is refused is the attempt to connect to the FQDN of the docker host. Every other mechanism that would resolve down to the same IP address works. I have included the network layout below. One thing to note is that we are using the "local" network as an overlay connecting our dispirat stacks. We do this so that connection attempts to .local do not leak out to the internet when the container does not exist. See: https://en.wikipedia.org/wiki/.local

We began to see this issue when we upgraded or Docker version from 19.3.11 to 20.10.8

Output of docker network ls

[michael.ottati@michael.skylab.eng.pdx.wd ~]$ docker network ls
NETWORK ID     NAME              DRIVER    SCOPE
b7f421ae9a0c   bridge            bridge    local
576e74b6ee41   docker_gwbridge   bridge    local
37eac575a2ce   host              host      local
p5ci9cgwqtwl   ingress           overlay   swarm
memilzpqbfcx   local             overlay   swarm
0a186c99e373   none              null      local

Output of docker version:

Client: Docker Engine - Community
 Version:           20.10.8
 API version:       1.41
 Go version:        go1.16.6
 Git commit:        3967b7d
 Built:             Fri Jul 30 19:55:49 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.8
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.6
  Git commit:       75249d8
  Built:            Fri Jul 30 19:54:13 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.8
  GitCommit:        7eba5930496d9bbe375fdf71603e610ad737d2b2
 runc:
  Version:          1.0.0
  GitCommit:        v1.0.0-0-g84113ee
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Output of docker info:

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.6.1-docker)
  scan: Docker Scan (Docker Inc., v0.8.0)

Server:
 Containers: 17
  Running: 17
  Paused: 0
  Stopped: 0
 Images: 83
 Server Version: 20.10.8
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: active
  NodeID: qo2y54sfomfuwze4tnrtvdu5z
  Is Manager: true
  ClusterID: mmf47po18mrety28qkm7wcp8l
  Managers: 1
  Nodes: 1
  Default Address Pool: 10.0.0.0/8
  SubnetSize: 24
  Data Path Port: 4789
  Orchestration:
   Task History Retention Limit: 5
  Raft:
   Snapshot Interval: 10000
   Number of Old Snapshots to Retain: 0
   Heartbeat Tick: 1
   Election Tick: 10
  Dispatcher:
   Heartbeat Period: 5 seconds
  CA Configuration:
   Expiry Duration: 3 months
   Force Rotate: 0
  Autolock Managers: false
  Root Rotation In Progress: false
  Node Address: 10.96.32.175
  Manager Addresses:
   10.96.32.175:2377
 Runtimes: runc io.containerd.runc.v2 io.containerd.runtime.v1.linux
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7eba5930496d9bbe375fdf71603e610ad737d2b2
 runc version: v1.0.0-0-g84113ee
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 3.10.0-1160.25.1.el7.x86_64
 Operating System: CentOS Linux 7 (Core)
 OSType: linux
 Architecture: x86_64
 CPUs: 56
 Total Memory: 503.8GiB
 Name: s-mxq80106bp.sys.az1.eng.pdx.wd
 ID: 43EI:4PVE:XUTS:RKMN:RBY5:4A4Q:FDVC:YEDX:WF55:CLPG:FHI3:X7CU
 Docker Root Dir: /data/var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  artifactory-az.services.wd:8080
  127.0.0.0/8
 Live Restore Enabled: false
 Default Address Pools:
   Base: 172.16.0.0/12, Size: 23

Additional environment details (AWS, VirtualBox, physical, etc.)

mottati commented 3 years ago

I did a little more research on this. It appears that our use of the "local" network name is somehow triggering a different execution path within docker. In order to test this, I ran 3 more tests, similar to the ones in the bug report.

Prior to running these tests I created a second overlay network, created identically to the way the "local" network was created. Here is how it was created, and what it looked like.

$ docker network create -d overlay --attachable overlay-2
ujimx2wmy4g5806y9yxpc9eqq
$ docker network ls
NETWORK ID     NAME              DRIVER    SCOPE
9c44f239b788   bridge            bridge    local
dde755403625   docker_gwbridge   bridge    local
857e6f415c0d   host              host      local
kybp70o5higc   ingress           overlay   swarm
z5r05znqats7   local             overlay   swarm
cc1a68b2c2a3   none              null      local
ujimx2wmy4g5   overlay-2         overlay   swarm

Test 1: run the container using standard network assignment and connect back to Docker Host

$ docker run -it  ubuntu-container ssh s-mxq80106bp.sys.az1.eng.pdx.wd
The authenticity of host 's-mxq80106bp.sys.az1.eng.pdx.wd (10.96.32.175)' can't be established.
ECDSA key fingerprint is SHA256:C47QAI27GeAx+M3/GxAGMoSi5RlFyiRtpFvdRoERXWk.
Are you sure you want to continue connecting (yes/no/[fingerprint])? no

Test 2: Repeat test 1 using newly created overlay-2 network

$ docker run -it --network overlay-2  ubuntu-container ssh s-mxq80106bp.sys.az1.eng.pdx.wd
The authenticity of host 's-mxq80106bp.sys.az1.eng.pdx.wd (10.96.32.175)' can't be established.
ECDSA key fingerprint is SHA256:C47QAI27GeAx+M3/GxAGMoSi5RlFyiRtpFvdRoERXWk.
Are you sure you want to continue connecting (yes/no/[fingerprint])? no

Test 3: Repeat test 1 using local network

$ docker run -it --network local  ubuntu-container ssh s-mxq80106bp.sys.az1.eng.pdx.wd
ssh: connect to host s-mxq80106bp.sys.az1.eng.pdx.wd port 22: Connection refused

As can be seen above, only the third test fails leading me to suspect that there is some kind of special behavior associated with the overlay network called "local".

The ubuntu-container used in this test was created from a jetbrains/teamcity-agent:2021.1.2 base.

mottati commented 3 years ago

I suppose we can work around this issue by using the --add-host option on the run command, that should not be necessary.

Test 4: Same as Test 3 above with the addition of the --add-host option.

$ docker run --add-host s-mxq80106bp.sys.az1.eng.pdx.wd:10.96.32.175 -it --network local  ubuntu-container ssh s-mxq80106bp.sys.az1.eng.pdx.wd
The authenticity of host 's-mxq80106bp.sys.az1.eng.pdx.wd (10.96.32.175)' can't be established.
ECDSA key fingerprint is SHA256:C47QAI27GeAx+M3/GxAGMoSi5RlFyiRtpFvdRoERXWk.
Are you sure you want to continue connecting (yes/no/[fingerprint])?