docker / cli

The Docker CLI
Apache License 2.0
4.89k stars 1.92k forks source link

Docker commands take 1 minute to timeout if context endpoint is unreachable #1739

Closed kinghuang closed 4 years ago

kinghuang commented 5 years ago

Description

The upcoming context feature seem to cause CLI commands to pause for a 1 minute timeout if the current context's endpoint is not reachable.

Some of my Docker endpoints are only reachable via a VPN connection. If the current context is for an endpoint that I cannot currently reach, every command takes an extra minute (60 seconds) to complete.

Steps to reproduce the issue:

  1. Create a context for an unreachable endpoint.
  2. Use that context.
  3. Run CLI commands like docker context ls or docker info.

Describe the results you received:

When the endpoint is unreachable, commands take an extra minute to complete.

In this example, when I switch to psc-prod-gen without the requisite network access, Docker CLI commands go through a 1 minute timeout. Even switching back to the default context for my local Docker config takes a minute.

▸ ~ time docker context ls
NAME                DESCRIPTION                               DOCKER ENDPOINT               KUBERNETES ENDPOINT                ORCHESTRATOR
default *           Current DOCKER_HOST based configuration   unix:///var/run/docker.sock   https://localhost:6443 (default)   swarm
psc-dev-gen         PSC Development Cluster                   tcp://hidden:2376                                           
psc-onboard-gen     PSC Onboard Cluster                       tcp://hidden:2376                                          
psc-prod-gen        PSC Production Cluster                    tcp://hidden:2376                                          
psc-staging-gen     PSC Staging Cluster                       tcp://hidden:2376                                           
psc-uat-gen         PSC UAT Cluster                           tcp://hidden:2376                                          
/Users/king.huang/go/src/github.com/docker/cli/build/docker-darwin-amd64  ls  0.03s user 0.01s system 101% cpu 0.047 total

▸ ~ time docker context use psc-prod-gen
psc-prod-gen
Current context is now "psc-prod-gen"
/Users/king.huang/go/src/github.com/docker/cli/build/docker-darwin-amd64  use  0.04s user 0.02s system 94% cpu 0.060 total

▸ ~ time docker context ls
NAME                DESCRIPTION                               DOCKER ENDPOINT           KUBERNETES ENDPOINT   ORCHESTRATOR
default             Current DOCKER_HOST based configuration                                                   
psc-dev-gen         PSC Development Cluster                   tcp://hidden:2376                          
psc-onboard-gen     PSC Onboard Cluster                       tcp://hidden:2376                         
psc-prod-gen *      PSC Production Cluster                    tcp://hidden:2376                         
psc-staging-gen     PSC Staging Cluster                       tcp://hidden:2376                          
psc-uat-gen         PSC UAT Cluster                           tcp://hidden:2376                         
/Users/king.huang/go/src/github.com/docker/cli/build/docker-darwin-amd64  ls  0.03s user 0.02s system 0% cpu 1:00.05 total

▸ ~ time docker context use default
default
Current context is now "default"
/Users/king.huang/go/src/github.com/docker/cli/build/docker-darwin-amd64  use  0.03s user 0.02s system 0% cpu 1:00.05 total

▸ ~ time docker context ls
NAME                DESCRIPTION                               DOCKER ENDPOINT               KUBERNETES ENDPOINT                ORCHESTRATOR
default *           Current DOCKER_HOST based configuration   unix:///var/run/docker.sock   https://localhost:6443 (default)   swarm
psc-dev-gen         PSC Development Cluster                   tcp://hidden:2376                                           
psc-onboard-gen     PSC Onboard Cluster                       tcp://hidden:2376                                          
psc-prod-gen        PSC Production Cluster                    tcp://hidden:2376                                          
psc-staging-gen     PSC Staging Cluster                       tcp://hidden:2376                                           
psc-uat-gen         PSC UAT Cluster                           tcp://hidden:2376                                          
/Users/king.huang/go/src/github.com/docker/cli/build/docker-darwin-amd64  ls  0.03s user 0.01s system 94% cpu 0.048 total

Describe the results you expected:

Commands should fail immediately if the endpoint is unreachable.

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

Client:
 Version:           19.03.0-dev
 API version:       1.39 (downgraded from 1.40)
 Go version:        go1.11.5
 Git commit:        81ac432c
 Built:             Tue Mar 12 22:57:30 2019
 OS/Arch:           darwin/amd64
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          18.09.3
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.8
  Git commit:       774a1f4
  Built:            Thu Feb 28 06:40:58 2019
  OS/Arch:          linux/amd64
  Experimental:     true

Output of docker info:

Client:
 Debug Mode: false

Server:
 Containers: 16
  Running: 1
  Paused: 0
  Stopped: 15
 Images: 315
 Server Version: 18.09.3
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: active
  NodeID: ilntgb4sbycucyl3iiva1wk3p
  Is Manager: true
  ClusterID: wsz99afbhy169obu69h1eq7z0
  Managers: 1
  Nodes: 1
  Default Address Pool: 10.0.0.0/8  
  SubnetSize: 24
  Orchestration:
   Task History Retention Limit: 5
  Raft:
   Snapshot Interval: 10000
   Number of Old Snapshots to Retain: 0
   Heartbeat Tick: 1
   Election Tick: 10
  Dispatcher:
   Heartbeat Period: 5 seconds
  CA Configuration:
   Expiry Duration: 3 months
   Force Rotate: 0
  Autolock Managers: false
  Root Rotation In Progress: false
  Node Address: 192.168.65.3
  Manager Addresses:
   192.168.65.3:2377
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: e6b3f5632f50dbc4e9cb6288d911bf4f5e95b18e
 runc version: 6635b4f0c6af3810594d2770f662f34ddc15b40d
 init version: fec3683
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 4.9.125-linuxkit
 Operating System: Docker Desktop
 OSType: linux
 Architecture: x86_64
 CPUs: 6
 Total Memory: 7.786GiB
 Name: docker-desktop
 ID: Z6TC:PPXT:NO7F:M5SJ:E6D7:R4PM:3A6I:BZLQ:UTDJ:KPCE:6YKD:XI6X
 Docker Root Dir: /var/lib/docker
 Debug Mode: true
  File Descriptors: 56
  Goroutines: 200
  System Time: 2019-03-15T17:29:19.1958634Z
  EventsListeners: 3
 HTTP Proxy: gateway.docker.internal:3128
 HTTPS Proxy: gateway.docker.internal:3129
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: true
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
 Product License: Community Engine

Additional environment details (AWS, VirtualBox, physical, etc.):

simonferquel commented 5 years ago

I suppose the docker info case makes sense to wait for the timeout (it is doing an API call), but context ls/context use etc, should not.

@thaJeztah I will have a look at it

simonferquel commented 5 years ago

I just looked at it, and it comes from the early call to initializeFromClient in the CLI Initialize function. Actually the problem seems to be present in 18.09 as well: https://github.com/docker/cli/blob/18.09/cli/command/cli.go#L197.

I suppose I can work on a fix for the upcoming version, but it might be difficult to backport.