docker / for-linux

Docker Engine for Linux
https://docs.docker.com/engine/installation/
756 stars 85 forks source link

Docker inspect hangs forever #397

Open Routhinator opened 6 years ago

Routhinator commented 6 years ago

Expected behavior

Docker inspect returns information, allowing docker-compose to work

Actual behavior

Docker inspect hangs forever and causes an HTTP timeout

Steps to reproduce the behavior

I've been getting this off and on since January of this year. But not on Mac, on Ubuntu. First on 16.04, then on 17.04, 17.10 and now on 18.04 - and my verbose output hangs at the exact same spot as the reports on the docker for mac issue, during container inspection. This is limited to Desktop linux, and some indicators seem to imply this is from the Mac going to sleep, which would line up with my experience. When Ubuntu goes to sleep and comes back, the issue is created. Rebooting the OS seems to be the only workaround, as restarting the docker daemon does not resolve the issue.

Output of docker version:

Client:
 Version:           18.06.0-ce
 API version:       1.38
 Go version:        go1.10.3
 Git commit:        0ffa825
 Built:             Wed Jul 18 19:09:54 2018
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.0-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.3
  Git commit:       0ffa825
  Built:            Wed Jul 18 19:07:56 2018
  OS/Arch:          linux/amd64
  Experimental:     false

Output of docker info:

Containers: 218
 Running: 0
 Paused: 0
 Stopped: 218
Images: 2
Server Version: 18.06.0-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: d64c661f1d51c48782c9cec8fda7604785f93587
runc version: 69663f0bd4b60df09991c08812a60108003fa340
init version: fec3683
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.15.0-30-generic
Operating System: Ubuntu 18.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 15.53GiB
Name: crouth-Latitude-5480
ID: PESF:MXJF:NVGR:Q25S:65JB:V4QJ:WIVC:ALQF:4YPF:MXYC:27YK:5YUU
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

Additional environment details (AWS, VirtualBox, physical, etc.)

Tested and reproduced on Ubuntu Desktop 16.04, 17.04, 17.10 and 18.04

This is the exact same issue as seen on Mac desktops: https://github.com/docker/for-mac/issues/2378

docker-compose --verbose run playbook -i inventory/global/ playbooks/dev-jasper-reports.yml
compose.config.config.find: Using configuration files: ./docker-compose.yml
docker.utils.config.find_config_file: Trying paths: ['/home/crouth/.docker/config.json', '/home/crouth/.dockercfg']
docker.utils.config.find_config_file: Found file at path: /home/crouth/.docker/config.json
docker.auth.load_config: Found 'auths' section
urllib3.connectionpool._make_request: http://localhost:None "GET /v1.24/version HTTP/1.1" 200 543
compose.cli.command.get_client: docker-compose version 1.21.2, build a133471
docker-py version: 3.3.0
CPython version: 3.6.5
OpenSSL version: OpenSSL 1.0.1t  3 May 2016
compose.cli.command.get_client: Docker base_url: http+docker://localhost
compose.cli.command.get_client: Docker version: Platform={'Name': ''}, Components=[{'Name': 'Engine', 'Version': '18.06.0-ce', 'Details': {'ApiVersion': '1.38', 'Arch': 'amd64', 'BuildTime': '2018-07-18T19:07:56.000000000+00:00', 'Experimental': 'false', 'GitCommit': '0ffa825', 'GoVersion': 'go1.10.3', 'KernelVersion': '4.15.0-30-generic', 'MinAPIVersion': '1.12', 'Os': 'linux'}}], Version=18.06.0-ce, ApiVersion=1.38, MinAPIVersion=1.12, GitCommit=0ffa825, GoVersion=go1.10.3, Os=linux, Arch=amd64, KernelVersion=4.15.0-30-generic, BuildTime=2018-07-18T19:07:56.000000000+00:00
compose.cli.verbose_proxy.proxy_callable: docker inspect_network <- ('jasperreportsservers_default')
urllib3.connectionpool._make_request: http://localhost:None "GET /v1.24/networks/jasperreportsservers_default HTTP/1.1" 404 61
compose.cli.verbose_proxy.proxy_callable: docker inspect_network <- ('jasperreports-servers_default')
urllib3.connectionpool._make_request: http://localhost:None "GET /v1.24/networks/jasperreports-servers_default HTTP/1.1" 200 862
compose.cli.verbose_proxy.proxy_callable: docker inspect_network -> {'Attachable': True,
 'ConfigFrom': {'Network': ''},
 'ConfigOnly': False,
 'Containers': {'35ef0763b3a38a776409ef8d426f9fd1eb71549cb13c4166c638d33b7c77bd34': {'EndpointID': '46aaa4306520341b9d1f3f238ceebca2d1aa9f9b00bf9c2b221fa6313439ebc7',
                                                                                     'IPv4Address': '172.18.0.2/16',
                                                                                     'IPv6Address': '',
                                                                                     'MacAddress': '02:42:ac:12:00:02',
                                                                                     'Name': 'jasperreports-servers_playbook_run_112'}},
 'Created': '2018-06-27T13:27:42.910819198-07:00',
 'Driver': 'bridge',
...
compose.cli.verbose_proxy.proxy_callable: docker inspect_image <- ('dockerhub.alea.ca/docker/ansible')
urllib3.connectionpool._make_request: http://localhost:None "GET /v1.24/images/dockerhub.alea.ca/docker/ansible/json HTTP/1.1" 200 None
compose.cli.verbose_proxy.proxy_callable: docker inspect_image -> {'Architecture': 'amd64',
 'Author': '',
 'Comment': '',
 'Config': {'ArgsEscaped': True,
            'AttachStderr': False,
            'AttachStdin': False,
            'AttachStdout': False,
            'Cmd': None,
            'Domainname': '',
            'Entrypoint': ['ansible-playbook'],
...
compose.cli.verbose_proxy.proxy_callable: docker containers <- (all=True, filters={'label': ['com.docker.compose.project=jasperreports-servers', 'com.docker.compose.service=playbook', 'com.docker.compose.oneoff=True']})
urllib3.connectionpool._make_request: http://localhost:None "GET /v1.24/containers/json?limit=-1&all=1&size=0&trunc_cmd=0&filters=%7B%22label%22%3A+%5B%22com.docker.compose.project%3Djasperreports-servers%22%2C+%22com.docker.compose.service%3Dplaybook%22%2C+%22com.docker.compose.oneoff%3DTrue%22%5D%7D HTTP/1.1" 200 None
compose.cli.verbose_proxy.proxy_callable: docker containers -> (list with 112 items)
compose.cli.verbose_proxy.proxy_callable: docker inspect_container <- ('35ef0763b3a38a776409ef8d426f9fd1eb71549cb13c4166c638d33b7c77bd34')
ERROR: compose.cli.errors.log_timeout_error: An HTTP request took too long to complete. Retry with --verbose to obtain debug information.
If you encounter this issue regularly because of slow network conditions, consider setting COMPOSE_HTTP_TIMEOUT to a higher value (current value: 60).
cpuguy83 commented 6 years ago

Can you grab a stack trace from the daemon?

curl --unix-socket /var/run/docker.sock http://./debug/pprof/goroutine?debug=2

mathroc commented 6 years ago

I have the same problem, here is the stack trace https://pastebin.com/raw/pBNxVRaR

it's getting quite painful and it seems to happen more and more often (hence why I'm starting to look into this)

this is also for a container created by docker-compose, in my case the container did not get started:

docker ps -a -f name=phpunit CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES ced5781314bc v1_phpunit "sh" About an hour ago Created v1_phpunit_run_1

and since docker-compose tries to reuse the existing container this service is locked until docker is restarted (which often requires a computer reboot)

cpuguy83 commented 6 years ago

@mathroc Thanks!

Looks like this is stuck waiting on containerd for some reason. Can you pull the stack trace from containerd as well?

Should be able to get it with (assuming you are you using docker to manage containerd):

docker-containerd-ctr pprof --debug-socket /run/docker/containerd/docker-containerd-debug.sock goroutines
mathroc commented 6 years ago

@cpuguy83 I have one right now but I don't have the docker-containerd-ctr binary, is there an alternative to that ?

(I installed docker-ce and containerd via portage on gentoo)

halcyon22 commented 6 years ago

Similar problem here, but using swarm and not compose.

I have a swarm service that is not listed by docker service ls, but the container is listed by docker ps. Trying to inspect it hangs indefinitely.

4144e40b25f5        docker.example.com:5000/admin:2219              "/bin/bash /admin-do…"   6 days ago          Up 6 days (unhealthy)                       admin-service.osn1h8ctn3xeoacs9k42dxzi7.jmdgzie4mnvy475sxytmj5fse

$ docker info

Containers: 11
 Running: 11
 Paused: 0
 Stopped: 0
Images: 13
Server Version: 18.06.0-ce
Storage Driver: overlay2
 Backing Filesystem: xfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: osn1h8ctn3xeoacs9k42dxzi7
 Is Manager: true
 ClusterID: nzrxgungmmv3qr2ftg24htxg4
 Managers: 1
 Nodes: 1
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 10
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 10.64.225.104
 Manager Addresses:
  10.64.225.104:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: d64c661f1d51c48782c9cec8fda7604785f93587
runc version: 69663f0bd4b60df09991c08812a60108003fa340
init version: fec3683
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-862.3.3.el7.x86_64
Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.65GiB
Name: [hostname]
ID: WPK6:BMJC:ADNC:NUNT:VBYK:2VFV:BGLR:F4QV:5RFR:BNJ6:5KLY:N2A7
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

$ curl --unix-socket /var/run/docker.sock http://./debug/pprof/goroutine?debug=2 https://gist.githubusercontent.com/halcyon22/0470948eccb9dd285098e41d1ced4e1a/raw/315dc7863f20ac04ccdce26dc41a29f93cf5a392/docker_pprof.out

$ docker-containerd-ctr pprof --debug-socket /run/docker/containerd/docker-containerd-debug.sock goroutines https://gist.githubusercontent.com/halcyon22/b265cc89ed3fdabb11b1951169c728df/raw/90c00bc52124b7128963270f07dafd8b00a40c22/containerd_pprof.out

lindycoder commented 6 years ago

Hello, we have the same problem here, this is a year old version of docker but it might give some extra clues.

One interesting thing in my setup, dockerd has been up for 5 weeks and the containers up for 3-5 week. We have 8 running containers, 3 of which are currently hanging on docker inspect, the others are fine. the 3 containers are running python behind wsgi and some of the fine containers have the same setup.

Here's out info: Docker version

Client:
Version:    17.12.1-ce
API version:    1.35
Go version:    go1.9.4
Git commit:    7390fc6
Built:    Tue Feb 27 22:17:40 2018
OS/Arch:    linux/amd64

Server:
Engine:
 Version:    17.12.1-ce
 API version:    1.35 (minimum version 1.12)
 Go version:    go1.9.4
 Git commit:    7390fc6
 Built:    Tue Feb 27 22:16:13 2018
 OS/Arch:    linux/amd64
 Experimental:    false

docker info

Containers: 9
 Running: 9
 Paused: 0
 Stopped: 0
Images: 8
Server Version: 17.12.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9b55aab90508bd389d7654c4baf173a981477d55
runc version: 9f9c96235cc97674e935002fc3d78361b696a69e
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-127-generic
Operating System: Ubuntu 16.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.906GiB
Name: node02
ID: MLF2:BNI4:PH4C:U5K5:F3JD:SY26:QQ75:ULCL:LIVN:4WA5:ZU3U:CVZI
Docker Root Dir: /data/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

docker ppof https://gist.githubusercontent.com/lindycoder/9dd963bcaee8682d13e43544fcaaee25/raw/4799f77f203dfb13916218e049dc903c66616b75/docker_pprof.out

containerd ppof https://gist.githubusercontent.com/lindycoder/9dd963bcaee8682d13e43544fcaaee25/raw/4799f77f203dfb13916218e049dc903c66616b75/containerd_pprof.out

Since the container hangs when inspecting but are still running fine, i might let them like this for a while so if you have any extra information i could provide to help find a solution please tell me!

mathroc commented 6 years ago

in case it helps here is curl --unix-socket /var/run/docker.sock http://./debug/pprof/goroutine?debug=2 > docker_pprof.out : https://pastebin.com/raw/sMN1fxZe

cpuguy83 commented 6 years ago

@halcyon22

This looks like it's stuck waiting on a kill from the containerd shim (the process that manages your container process).

``` goroutine 2339 [select, 9557 minutes]: github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*Client).dispatch(0xc42067a2a0, 0x5577ba8012a0, 0xc4206ebe00, 0xc420550200, 0xc420287e40, 0x0, 0x0) /go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.go:102 +0x24c github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*Client).Call(0xc42067a2a0, 0x5577ba8012a0, 0xc4206ebe00, 0x5577ba0aca84, 0x25, 0x5577ba07fb17, 0x4, 0x5577ba71b5a0, 0xc420287e00, 0x5577ba75e460, ...) /go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.go:73 +0x15d github.com/containerd/containerd/linux/shim/v1.(*shimClient).Kill(0xc4201b4108, 0x5577ba8012a0, 0xc4206ebe00, 0xc420287e00, 0x5577ba6c5b20, 0x0, 0x5577ba8012a0) /go/src/github.com/containerd/containerd/linux/shim/v1/shim.pb.go:1801 +0xbf github.com/containerd/containerd/linux.(*Task).Kill(0xc42046e7e0, 0x5577ba8012a0, 0xc4206ebe00, 0x550000000009, 0xc4206ebe00, 0x5577ba80e5a0) /go/src/github.com/containerd/containerd/linux/task.go:183 +0x97 github.com/containerd/containerd/services/tasks.(*local).Kill(0xc42020d500, 0x7ff88b93a000, 0xc4206ebe00, 0xc4206ebe90, 0x0, 0x0, 0x0, 0x5577ba80e640, 0xc4206bc958, 0x3) /go/src/github.com/containerd/containerd/services/tasks/local.go:349 +0x137 github.com/containerd/containerd/services/tasks.(*service).Kill(0xc420393ce0, 0x7ff88b93a000, 0xc4206ebe00, 0xc4206ebe90, 0xc420393ce0, 0x5577ba0afec8, 0x4) /go/src/github.com/containerd/containerd/services/tasks/service.go:100 +0x68 github.com/containerd/containerd/api/services/tasks/v1._Tasks_Kill_Handler.func1(0x7ff88b93a000, 0xc4206ebe00, 0x5577ba7167c0, 0xc4206ebe90, 0xc4201b3a90, 0x68, 0x5577ba78cc00, 0xc4206bca00) /go/src/github.com/containerd/containerd/api/services/tasks/v1/tasks.pb.go:714 +0x88 github.com/containerd/containerd/vendor/github.com/grpc-ecosystem/go-grpc-prometheus.UnaryServerInterceptor(0x7ff88b93a000, 0xc4206ebe00, 0x5577ba7167c0, 0xc4206ebe90, 0xc420287ac0, 0xc420287b00, 0xc4206bca70, 0x5577b947789a, 0x70, 0x5577ba78cc00) /go/src/github.com/containerd/containerd/vendor/github.com/grpc-ecosystem/go-grpc-prometheus/server.go:29 +0xa6 github.com/containerd/containerd/api/services/tasks/v1._Tasks_Kill_Handler(0x5577ba797d20, 0xc420393ce0, 0x7ff88b93a000, 0xc4206ebe00, 0xc4200c3960, 0x5577ba7df250, 0x0, 0x0, 0xc42023350c, 0x0) /go/src/github.com/containerd/containerd/api/services/tasks/v1/tasks.pb.go:716 +0x169 github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).processUnaryRPC(0xc42049c420, 0x5577ba808580, 0xc420093800, 0xc420380dc0, 0xc420440c30, 0x5577baea89b0, 0x0, 0x0, 0x0) /go/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.go:920 +0x84a github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).handleStream(0xc42049c420, 0x5577ba808580, 0xc420093800, 0xc420380dc0, 0x0) /go/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.go:1142 +0x131a github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc42003c070, 0xc42049c420, 0x5577ba808580, 0xc420093800, 0xc420380dc0) /go/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.go:637 +0xa1 created by github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).serveStreams.func1 /go/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.go:635 +0xa3 ```

ping @crosbymichael @dmcgowan on this one.

@mathroc Any chance you can get the containerd output as well? I'm not sure how docker is packaged on Gentoo... maybe just a ctr binary is available, in which case sockets and such may be in different locations.

@lindycoder There are some known deadlocks in containerd and docker in 17.12, can you upgrade to a newer version? 18.06 is the most recent release.

mathroc commented 6 years ago

@cpuguy83 you're right there's a ctr bianry, here is the output : https://pastebin.com/raw/FpGwmfq7

halcyon22 commented 6 years ago

Thanks @cpuguy83 - with that information I stopped the Docker service, killed containerd, brought the service back up again, and the problem didn't reappear. I thought I had rebooted the server prior to adding my comment, but maybe that hadn't worked.

mhubig commented 5 years ago

I'm seeing the same issue here. Right now I'm trying to update all systems to the most resent version and check if the issue is resolved ...

whalebot-helmsman commented 5 years ago

Same problem on up to date docker version. As a possible solution you can locate process related to problem container by it's id and kill them

Client:
 Version:           18.06.1-ce
 API version:       1.38
 Go version:        go1.10.3
 Git commit:        e68fc7a
 Built:             Tue Aug 21 17:24:58 2018
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.1-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.3
  Git commit:       e68fc7a
  Built:            Tue Aug 21 17:23:24 2018
  OS/Arch:          linux/amd64
  Experimental:     false
Containers: 10
 Running: 7
 Paused: 0
 Stopped: 3
Images: 82
Server Version: 18.06.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: nvidia runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e
runc version: 69663f0bd4b60df09991c08812a60108003fa340
init version: fec3683
Security Options:
 apparmor
Kernel Version: 4.4.0-141-generic
Operating System: Ubuntu 14.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 62.78GiB
Name: *
ID: I6UV:RJTS:L2XD:3QRG:OW7C:HVBG:34GL:IBFP:LXE7:XX4Q:3W6A:JZAT
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
calebmayeux commented 5 years ago

We experienced the exact same issue with certain containers of ours. Same issue with various docker versions from version 18.03 to 18.09.2 and various kernels from CentOS 7.2 to 7.6. We ended up finding out that the issue for us only occurred with containers that utilized NFS type docker volumes, and our NFS daemon somehow got into a funky state where mounts would hang indefinitely. Restarting the NFS daemon fixed our issue and mounts succeeded again. Probably docker needs to have a timeout on NFS mount attempts and allow users to do things like delete/inspect containers and remove docker volumes for misbehaving NFS mounts.

Would be curious to know if other people on this thread experiencing this issue are using NFS volumes as well, or if there are separate root causes for this issue.

wimfabri commented 5 years ago

We got this problem on 18.06.1-ce, no NFS used, using docker-compose

blalor commented 5 years ago

We're also having this problem, 18.06.3-ce, using Nomad 0.8.7 on Ubuntu 14.04. Two instances exhibited this problem within minutes of one another a week ago, and another today.

dockerd-pprof.log docker-containerd-ctr.log

docker ps shows

root@nomad-client-blue-10-181-2-40:~# docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                                                      NAMES
59ef6eb37a06        8df826da855b        "/usr/bin/init.sh bu…"   29 minutes ago      Up 29 minutes       10.181.2.40:22386->9292/tcp, 10.181.2.40:22386->9292/udp   frontend-9fd92404-4aaf-0dd1-26e9-84c74e5e3ddd
b97162768a9f        1b6c25ab9e92        "/usr/bin/init.sh bu…"   11 hours ago        Up 11 hours         10.181.2.40:25345->9292/tcp, 10.181.2.40:25345->9292/udp   frontend-b094a318-4ce3-4f37-bb63-72be91cedb0f

and the process tree shows:

 1055 ?        Ssl  116:04 /usr/bin/dockerd --raw-logs
 1978 ?        Ssl  196:19  \_ docker-containerd --config /var/run/docker/containerd/containerd.toml
12136 ?        Sl     0:01  |   \_ docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/b97162768a9f54d0a289cd50
11069 ?        Sl     0:00  |   \_ docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/59ef6eb37a0661c4386f66a0
11087 ?        Ssl    0:00  |       \_ puma 3.12.0 (tcp://0.0.0.0:9292) [app]
11153 ?        Sl     3:41  |           \_ …
12117 ?        Sl     0:00  \_ /usr/bin/docker-proxy -proto tcp -host-ip 10.181.2.40 -host-port 25345 -container-ip 172.17.0.2 -container-port 9292
12130 ?        Sl     0:00  \_ /usr/bin/docker-proxy -proto udp -host-ip 10.181.2.40 -host-port 25345 -container-ip 172.17.0.2 -container-port 9292
11047 ?        Sl     0:00  \_ /usr/bin/docker-proxy -proto tcp -host-ip 10.181.2.40 -host-port 22386 -container-ip 172.17.0.5 -container-port 9292
11061 ?        Sl     0:00  \_ /usr/bin/docker-proxy -proto udp -host-ip 10.181.2.40 -host-port 22386 -container-ip 172.17.0.5 -container-port 9292

Note that docker-containerd-shim for b97162768a9f has no child processes. I can docker inspect 59ef6eb37a06, but docker inspect b97162768a9f hangs.

A simple kill 12136 (of the docker-containerd-shim process) doesn't work, I have to kill -9. Even afterwards, docker inspect b97162768a9f still hangs. Only after I stop and start the docker service am I able to inspect both containers.

The output from docker-containerd-ctr shows a couple of goroutines that have been alive from the time the "stuck" container was asked to stop:

goroutine 36642793 [select, 26 minutes]:
github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*Client).run(0xc4205569c0)
    /go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.go:171 +0x309
created by github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.NewClient
    /go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.go:46 +0x147

goroutine 36642794 [IO wait, 26 minutes]:
internal/poll.runtime_pollWait(0x7fb457e2c950, 0x72, 0xc42064cb80)
    /usr/local/go/src/runtime/netpoll.go:173 +0x59
internal/poll.(*pollDesc).wait(0xc4200eab18, 0x72, 0xffffffffffffff00, 0x558dcff4ef20, 0x558dd0615a98)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0x9d
internal/poll.(*pollDesc).waitRead(0xc4200eab18, 0xc420df3000, 0x1000, 0x1000)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3f
internal/poll.(*FD).Read(0xc4200eab00, 0xc420df3000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    /usr/local/go/src/internal/poll/fd_unix.go:157 +0x17f
net.(*netFD).Read(0xc4200eab00, 0xc420df3000, 0x1000, 0x1000, 0xc42064cc68, 0x558dcec21010, 0xc420502a80)
    /usr/local/go/src/net/fd_unix.go:202 +0x51
net.(*conn).Read(0xc4209141b0, 0xc420df3000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/net.go:176 +0x6c
bufio.(*Reader).Read(0xc420556720, 0xc4209129e0, 0xa, 0xa, 0x0, 0x1, 0x0)
    /usr/local/go/src/bufio/bufio.go:216 +0x23a
io.ReadAtLeast(0x558dcff4ba60, 0xc420556720, 0xc4209129e0, 0xa, 0xa, 0xa, 0xc420425a00, 0x0, 0xc420870b40)
    /usr/local/go/src/io/io.go:309 +0x88
io.ReadFull(0x558dcff4ba60, 0xc420556720, 0xc4209129e0, 0xa, 0xa, 0xc42064cf48, 0x2, 0x2)
    /usr/local/go/src/io/io.go:327 +0x5a
github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.readMessageHeader(0xc4209129e0, 0xa, 0xa, 0x558dcff4ba60, 0xc420556720, 0xc42064cf48, 0x2, 0x2, 0xc42064cfa8)
    /go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/channel.go:38 +0x62
github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*channel).recv(0xc4209129c0, 0x558dcff65b40, 0xc420040080, 0x0, 0xc420590002, 0xc4206ea1c0, 0x1b8, 0x1b8, 0x0, 0x0)
    /go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/channel.go:86 +0x6f
github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*Client).run.func1(0xc420870ba0, 0xc4205569c0, 0xc4208da170, 0xc420870b40)
    /go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.go:144 +0x90
created by github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*Client).run
    /go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.go:139 +0x13f

goroutine 36642792 [syscall, 26 minutes]:
syscall.Syscall6(0xf7, 0x1, 0x2b3d, 0xc4202afde0, 0x1000004, 0x0, 0x0, 0x558dcf06504a, 0xc4200e22c0, 0x558dcff6cea0)
    /usr/local/go/src/syscall/asm_linux_amd64.s:44 +0x5
os.(*Process).blockUntilWaitable(0xc420b08510, 0x558dcfd29be0, 0xc4208da030, 0x0)
    /usr/local/go/src/os/wait_waitid.go:31 +0x9a
os.(*Process).wait(0xc420b08510, 0x558dcff65bc0, 0xc420c7b350, 0x0)
    /usr/local/go/src/os/exec_unix.go:22 +0x3e
os.(*Process).Wait(0xc420b08510, 0x48, 0xc4202aff70, 0x558dcebf37fb)
    /usr/local/go/src/os/exec.go:123 +0x2d
os/exec.(*Cmd).Wait(0xc4205f8840, 0x558dcec25191, 0xc4202d4210)
    /usr/local/go/src/os/exec/exec.go:461 +0x5e
github.com/containerd/containerd/linux/shim/client.WithStart.func1.2(0xc4205f8840, 0xc420e2f220)
    /go/src/github.com/containerd/containerd/linux/shim/client/client.go:78 +0x2d
created by github.com/containerd/containerd/linux/shim/client.WithStart.func1
    /go/src/github.com/containerd/containerd/linux/shim/client/client.go:77 +0x307

goroutine 34853989 [select, 26 minutes]:
github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*Client).run(0xc4204e7320)
    /go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.go:171 +0x309
created by github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.NewClient
    /go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.go:46 +0x147
cjellick commented 5 years ago

I believe we are seeing this as well. It seems to be tied to situations where containers are getting oomkilled or otherwise are in restart loops

I see two long running calls to containerd:

``` goroutine 1374264 [select, 689 minutes]: github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*Stream).waitOnHeader(0xc42a1f42d0, 0x10, 0xc423b152f0) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.go:222 +0x101 github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*Stream).RecvCompress(0xc42a1f42d0, 0x555558c3fbb0, 0xc423b153b0) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.go:233 +0x2d github.com/docker/docker/vendor/google.golang.org/grpc.(*csAttempt).recvMsg(0xc42a1f8340, 0x555558aa9b20, 0xc4247840c8, 0x0, 0x0) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:515 +0x63b github.com/docker/docker/vendor/google.golang.org/grpc.(*clientStream).RecvMsg(0xc42a1f6280, 0x555558aa9b20, 0xc4247840c8, 0x0, 0x0) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:395 +0x45 github.com/docker/docker/vendor/google.golang.org/grpc.invoke(0x555558c74680, 0xc42a5c70e0, 0x555558074062, 0x27, 0x555558ab4da0, 0xc42c811220, 0x555558aa9b20, 0xc4247840c8, 0xc420648300, 0xc4208f4900, ...) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:83 +0x185 github.com/docker/docker/vendor/github.com/containerd/containerd.namespaceInterceptor.unary(0x55555802962e, 0x4, 0x555558c74600, 0xc42003e018, 0x555558074062, 0x27, 0x555558ab4da0, 0xc42c811220, 0x555558aa9b20, 0xc4247840c8, ...) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/grpc.go:35 +0xf6 github.com/docker/docker/vendor/github.com/containerd/containerd.(namespaceInterceptor).(github.com/docker/docker/vendor/github.com/containerd/containerd.unary)-fm(0x555558c74600, 0xc42003e018, 0x555558074062, 0x27, 0x555558ab4da0, 0xc42c811220, 0x555558aa9b20, 0xc4247840c8, 0xc420648300, 0x555558c3fc20, ...) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/grpc.go:51 +0xf6 github.com/docker/docker/vendor/google.golang.org/grpc.(*ClientConn).Invoke(0xc420648300, 0x555558c74600, 0xc42003e018, 0x555558074062, 0x27, 0x555558ab4da0, 0xc42c811220, 0x555558aa9b20, 0xc4247840c8, 0x0, ...) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:35 +0x10b github.com/docker/docker/vendor/google.golang.org/grpc.Invoke(0x555558c74600, 0xc42003e018, 0x555558074062, 0x27, 0x555558ab4da0, 0xc42c811220, 0x555558aa9b20, 0xc4247840c8, 0xc420648300, 0x0, ...) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:60 +0xc3 github.com/docker/docker/vendor/github.com/containerd/containerd/api/services/tasks/v1.(*tasksClient).Get(0xc4247840c0, 0x555558c74600, 0xc42003e018, 0xc42c811220, 0x0, 0x0, 0x0, 0xc426ebf7b8, 0x555556dde1ff, 0x555558c74680) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/api/services/tasks/v1/tasks.pb.go:448 +0xd4 github.com/docker/docker/vendor/github.com/containerd/containerd.(*process).Status(0xc42453a510, 0x555558c74600, 0xc42003e018, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x40, ...) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/process.go:220 +0x119 github.com/docker/docker/vendor/github.com/containerd/containerd.(*process).Delete(0xc42453a510, 0x555558c74600, 0xc42003e018, 0x0, 0x0, 0x0, 0x40, 0xc42c097680, 0x40) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/process.go:196 +0x119 github.com/docker/docker/libcontainerd.(*client).processEvent.func1() /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/libcontainerd/client_daemon.go:713 +0x212 github.com/docker/docker/libcontainerd.(*queue).append.func1(0xc42c097601, 0xc42a7b0420, 0xc421f6ba00, 0xc42a6483c0, 0xc4202f46e8, 0xc42c097640, 0x40) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/libcontainerd/queue.go:26 +0x3a created by github.com/docker/docker/libcontainerd.(*queue).append /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/libcontainerd/queue.go:22 +0x196 ... goroutine 1377768 [select, 688 minutes]: github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*Stream).waitOnHeader(0xc426a2ec30, 0x10, 0xc42a3b2bd0) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.go:222 +0x101 github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*Stream).RecvCompress(0xc426a2ec30, 0x555558c3fbb0, 0xc42a3b2c90) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.go:233 +0x2d github.com/docker/docker/vendor/google.golang.org/grpc.(*csAttempt).recvMsg(0xc42247f380, 0x555558b22200, 0x55555986b588, 0x0, 0x0) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:515 +0x63b github.com/docker/docker/vendor/google.golang.org/grpc.(*clientStream).RecvMsg(0xc420408300, 0x555558b22200, 0x55555986b588, 0x0, 0x0) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:395 +0x45 github.com/docker/docker/vendor/google.golang.org/grpc.invoke(0x555558c74680, 0xc426217a40, 0x555558076915, 0x28, 0x555558ab5f20, 0xc428567140, 0x555558b22200, 0x55555986b588, 0xc420648300, 0xc4208f4900, ...) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:83 +0x185 github.com/docker/docker/vendor/github.com/containerd/containerd.namespaceInterceptor.unary(0x55555802962e, 0x4, 0x555558c74600, 0xc42003e018, 0x555558076915, 0x28, 0x555558ab5f20, 0xc428567140, 0x555558b22200, 0x55555986b588, ...) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/grpc.go:35 +0xf6 github.com/docker/docker/vendor/github.com/containerd/containerd.(namespaceInterceptor).(github.com/docker/docker/vendor/github.com/containerd/containerd.unary)-fm(0x555558c74600, 0xc42003e018, 0x555558076915, 0x28, 0x555558ab5f20, 0xc428567140, 0x555558b22200, 0x55555986b588, 0xc420648300, 0x555558c3fc20, ...) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/grpc.go:51 +0xf6 github.com/docker/docker/vendor/google.golang.org/grpc.(*ClientConn).Invoke(0xc420648300, 0x555558c74600, 0xc42003e018, 0x555558076915, 0x28, 0x555558ab5f20, 0xc428567140, 0x555558b22200, 0x55555986b588, 0x0, ...) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:35 +0x10b github.com/docker/docker/vendor/google.golang.org/grpc.Invoke(0x555558c74600, 0xc42003e018, 0x555558076915, 0x28, 0x555558ab5f20, 0xc428567140, 0x555558b22200, 0x55555986b588, 0xc420648300, 0x0, ...) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:60 +0xc3 github.com/docker/docker/vendor/github.com/containerd/containerd/api/services/tasks/v1.(*tasksClient).Exec(0xc4285e0ae8, 0x555558c74600, 0xc42003e018, 0xc428567140, 0x0, 0x0, 0x0, 0xa3, 0xc428566f60, 0x0) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/api/services/tasks/v1/tasks.pb.go:475 +0xd4 github.com/docker/docker/vendor/github.com/containerd/containerd.(*task).Exec(0xc423993e90, 0x555558c74600, 0xc42003e018, 0xc428dc2bc0, 0x40, 0xc42247e8f0, 0xc428566f60, 0x0, 0x0, 0x0, ...) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/task.go:333 +0x2ba github.com/docker/docker/libcontainerd.(*client).Exec(0xc4202f4690, 0x555558c74600, 0xc42003e018, 0xc421cdef40, 0x40, 0xc428dc2bc0, 0x40, 0xc42247e8f0, 0x0, 0xc4234531f0, ...) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/libcontainerd/client_daemon.go:367 +0x3f1 github.com/docker/docker/daemon.(*Daemon).ContainerExecStart(0xc4204828c0, 0x555558c74600, 0xc42003e018, 0xc4228250f1, 0x40, 0x7ffff7f0f370, 0xc42093f0c0, 0x555558c510a0, 0xc42093f280, 0x555558c510a0, ...) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/exec.go:251 +0xb4f github.com/docker/docker/api/server/router/container.(*containerRouter).postContainerExecStart(0xc420d71fc0, 0x555558c74680, 0xc425c77080, 0x555558c72b40, 0xc4272fc0e0, 0xc423bdad00, 0xc425c76fc0, 0x0, 0x0) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/exec.go:125 +0x34b github.com/docker/docker/api/server/router/container.(*containerRouter).(github.com/docker/docker/api/server/router/container.postContainerExecStart)-fm(0x555558c74680, 0xc425c77080, 0x555558c72b40, 0xc4272fc0e0, 0xc423bdad00, 0xc425c76fc0, 0x555556bcee3c, 0x555558aba440) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/container.go:59 +0x6b github.com/docker/docker/api/server/middleware.ExperimentalMiddleware.WrapHandler.func1(0x555558c74680, 0xc425c77080, 0x555558c72b40, 0xc4272fc0e0, 0xc423bdad00, 0xc425c76fc0, 0x555558c74680, 0xc425c77080) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/experimental.go:26 +0xda github.com/docker/docker/api/server/middleware.VersionMiddleware.WrapHandler.func1(0x555558c74680, 0xc425c77050, 0x555558c72b40, 0xc4272fc0e0, 0xc423bdad00, 0xc425c76fc0, 0x0, 0xc42b14ba58) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/version.go:62 +0x401 github.com/docker/docker/pkg/authorization.(*Middleware).WrapHandler.func1(0x555558c74680, 0xc425c77050, 0x555558c72b40, 0xc4272fc0e0, 0xc423bdad00, 0xc425c76fc0, 0x555558c74680, 0xc425c77050) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/pkg/authorization/middleware.go:59 +0x7ab github.com/docker/docker/api/server.(*Server).makeHTTPHandler.func1(0x555558c72b40, 0xc4272fc0e0, 0xc423bdad00) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/server.go:141 +0x19a net/http.HandlerFunc.ServeHTTP(0xc4212fed40, 0x555558c72b40, 0xc4272fc0e0, 0xc423bdad00) /usr/local/go/src/net/http/server.go:1947 +0x46 github.com/docker/docker/vendor/github.com/gorilla/mux.(*Router).ServeHTTP(0xc4217185f0, 0x555558c72b40, 0xc4272fc0e0, 0xc423bdad00) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/gorilla/mux/mux.go:103 +0x228 github.com/docker/docker/api/server.(*routerSwapper).ServeHTTP(0xc420edc910, 0x555558c72b40, 0xc4272fc0e0, 0xc423bdad00) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router_swapper.go:29 +0x72 net/http.serverHandler.ServeHTTP(0xc4204365b0, 0x555558c72b40, 0xc4272fc0e0, 0xc423bdad00) /usr/local/go/src/net/http/server.go:2694 +0xbe net/http.(*conn).serve(0xc420b67ea0, 0x555558c745c0, 0xc426d72a40) /usr/local/go/src/net/http/server.go:1830 +0x653 created by net/http.(*Server).Serve /usr/local/go/src/net/http/server.go:2795 +0x27d ```

docker info:

``` Containers: 206 Running: 165 Paused: 0 Stopped: 41 Images: 112 Server Version: 18.06.2-ce Storage Driver: overlay2 Backing Filesystem: xfs Supports d_type: true Native Overlay Diff: true Logging Driver: json-file Cgroup Driver: systemd Plugins: Volume: local Network: bridge host macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog Swarm: inactive Runtimes: runc Default Runtime: runc Init Binary: docker-init containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e runc version: 69663f0bd4b60df09991c08812a60108003fa340 init version: fec3683 Security Options: seccomp Profile: default Kernel Version: 3.10.0-862.6.3.el7.x86_64 Operating System: CentOS Linux 7 (Core) OSType: linux Architecture: x86_64 CPUs: 56 Total Memory: 503.6GiB Docker Root Dir: /mongodb/docker Debug Mode (client): false Debug Mode (server): false Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false ```

Seem like the same issue @cpuguy83? any further guidance?

cjellick commented 5 years ago

This seems closely related to https://github.com/moby/moby/issues/38064

timchenxiaoyu commented 5 years ago

docker 18.09-ce also have this problem

ghost commented 5 years ago

Seems like the bug is also present in 19.03.1

~# docker version
Client: Docker Engine - Community
 Version:           19.03.1
 API version:       1.40
 Go version:        go1.12.5
 Git commit:        74b1e89
 Built:             Thu Jul 25 21:22:03 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.1
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.5
  Git commit:       74b1e89
  Built:            Thu Jul 25 21:20:35 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.6
  GitCommit:        894b81a4b802e4eb2a91d1ce216b8817763c29fb
 runc:
  Version:          1.0.0-rc8
  GitCommit:        425e105d5a03fabd737a126ad93d62a9eeede87f
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Stack trace: https://plik.himbeere.pw/file/yw4XSErMAEMxG9wD/wUQJrxyVYWsRGzhp/stacktrace

Example:

# time timeout 10 docker rm -f dev_admin_php-fpm_1

real    0m10.004s
user    0m0.011s
sys     0m0.024s
# time timeout 10 docker inspect dev_admin_php-fpm_1

real    0m10.008s
user    0m0.013s
sys     0m0.024s

Restarting the docker daemon fixed it.

glaux commented 5 years ago

I can confirm that this issue is present on 19.03.1 on Ubuntu 18.04.2 LTS.

Does anyone have a workaround that only impacts the offending container/service? I have only been able to fix this issue by rebooting the server, as the system hangs when trying to restart the docker daemon (sudo systemctl restart docker, sudo service docker start).

This is naturally not very good for the uptime of my services :(

The machine should have plenty resources, memory is around 5-10% and system load is usually way low as well.

buck2202 commented 5 years ago

Happening with extremely high frequency to me on multiple ubuntu machines running 19.03.1. docker inspect is an essential part of the scripts that manage my short-lived containers, so this is a breaking issue for me. My last heavy use of docker was on 18.09.x, and I never saw this issue across tens of thousands of containers and many machines.

No NFS, not using docker compose, and the containers that inspect hangs on were not otherwise misbehaving.

edit to add: In an attempt to work around the issue, I made a shell script with permissions to read the docker/containers folder and manually pull fields that I need from the config.v2.json file. Somewhat worried about reading it while it's in the process of being rewritten or edited, but that's the only way I could think to eliminate calls to inspect. But...it turns out that I have other breaking issues with 19.03.1 and have reverted to 18.09

cpuguy83 commented 5 years ago

@luminouspanda if you can put your stack trace in a site not flagged as a security issue, such as gist, then we can review it.

Everyone else experiencing problems on 19.03, there seems to be a regression (At least impirically based on number of reports I've seen). Can you please post the stack trace from effected systems?

curl --unix-socket /var/run/docker.sock http://./debug/pprof/goroutine?debug=2

buck2202 commented 5 years ago

@cpuguy83 : not sure if this will be useful, since like I said, I have other issues with 19.03 (will create a new bug report...but, in short, docker never registers that a container has exited after its referenced PID quits). But, including anyway this in the hope that it will be useful

docker version (experimental is enabled because I use checkpoints):

Client: Docker Engine - Community
 Version:           19.03.1
 API version:       1.40
 Go version:        go1.12.5
 Git commit:        74b1e89
 Built:             Thu Jul 25 21:21:05 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.1
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.5
  Git commit:       74b1e89
  Built:            Thu Jul 25 21:19:41 2019
  OS/Arch:          linux/amd64
  Experimental:     true
 containerd:
  Version:          1.2.6
  GitCommit:        894b81a4b802e4eb2a91d1ce216b8817763c29fb
 runc:
  Version:          1.0.0-rc8
  GitCommit:        425e105d5a03fabd737a126ad93d62a9eeede87f
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

docker info:

Client:
 Debug Mode: false

Server:
 Containers: 7
  Running: 7
  Paused: 0
  Stopped: 0
 Images: 10
 Server Version: 19.03.1
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
 runc version: 425e105d5a03fabd737a126ad93d62a9eeede87f
 init version: fec3683
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 5.0.0-25-generic
 Operating System: Linux Mint 19
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 15.37GiB
 Name: thinkpad
 ID: W7ON:HPJN:2SGV:LS7L:VBOZ:RTGK:JN46:HTF7:N4KW:H4PU:24AH:E3OK
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: true
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: true

WARNING: No swap limit support

output of curl --unix-socket /var/run/docker.sock http://./debug/pprof/goroutine?debug=2 with a hung docker container inspect: https://gist.github.com/buck2202/8058fda098f992c4a35bc181f27bcbf3

glaux commented 5 years ago

The issue seems more correlated with the time of day than anything else, the last three days I've seen it around 3:30 pm but that must surely be a coincidence?

Anyway, here are the output of curl --unix-socket /var/run/docker.sock http://./debug/pprof/goroutine?debug=2:

https://gist.github.com/glaux/6562f669885929dcb26ef668d2fcb71c

This time I had run composer require in a php container when I got an error about memory exhaustion (I have only a few containers running on a small dev server):

Fatal error: Allowed memory size of 1610612736 bytes exhausted (tried to allocate 4096 bytes) in phar:///usr/local/bin/composer/src/Composer/DependencyResolver/Solver.php on line 223

Check https://getcomposer.org/doc/articles/troubleshooting.md#memory-limit-errors for more info on how to handle out of memory errors.

When running docker-compose up I get the message:

ERROR: An HTTP request took too long to complete. Retry with --verbose to obtain debug information.
If you encounter this issue regularly because of slow network conditions, consider setting COMPOSE_HTTP_TIMEOUT to a higher value (current value: 60).

That particular error is discussed here: https://github.com/docker/compose/issues/5620 which links to this thread.

Docker version:

$ docker version
Client: Docker Engine - Community
 Version:           19.03.1
 API version:       1.40
 Go version:        go1.12.5
 Git commit:        74b1e89
 Built:             Thu Jul 25 21:21:05 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.1
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.5
  Git commit:       74b1e89
  Built:            Thu Jul 25 21:19:41 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.6
  GitCommit:        894b81a4b802e4eb2a91d1ce216b8817763c29fb
 runc:
  Version:          1.0.0-rc8
  GitCommit:        425e105d5a03fabd737a126ad93d62a9eeede87f
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683
cpuguy83 commented 5 years ago

This looks like a deadlock waiting on data back from containerd. I know I've seen some things come across regarding this.

Unfortunately the debug socket for containerd is no longer available from a default docker install so tracking down the cause is a bit harder :(

goroutine 562400 [select, 3 minutes]:
github.com/docker/docker/vendor/google.golang.org/grpc/internal/transport.(*Stream).waitOnHeader(0xc001a08000, 0x8, 0xc00134b880)
    /go/src/github.com/docker/docker/vendor/google.golang.org/grpc/internal/transport/transport.go:267 +0xce
github.com/docker/docker/vendor/google.golang.org/grpc/internal/transport.(*Stream).RecvCompress(...)
    /go/src/github.com/docker/docker/vendor/google.golang.org/grpc/internal/transport/transport.go:278
github.com/docker/docker/vendor/google.golang.org/grpc.(*csAttempt).recvMsg(0xc00134b880, 0x55b7e354e780, 0xc0013adf40, 0x0, 0xc00258cae0, 0x0)
    /go/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:856 +0x70e
github.com/docker/docker/vendor/google.golang.org/grpc.(*clientStream).RecvMsg.func1(0xc00134b880, 0x0, 0x0)
    /go/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:721 +0x48
github.com/docker/docker/vendor/google.golang.org/grpc.(*clientStream).withRetry(0xc000e73320, 0xc00258cae8, 0xc00258cad8, 0xc001a87810, 0x0)
    /go/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:577 +0x7e
github.com/docker/docker/vendor/google.golang.org/grpc.(*clientStream).RecvMsg(0xc000e73320, 0x55b7e354e780, 0xc0013adf40, 0x0, 0x0)
    /go/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:720 +0x102
github.com/docker/docker/vendor/google.golang.org/grpc.invoke(0x55b7e3705060, 0xc002cc0030, 0x55b7e2651589, 0x2a, 0x55b7e354b960, 0xc00269df80, 0x55b7e354e780, 0xc0013adf40, 0xc000176dc0, 0xc000891520, ...)
    /go/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:73 +0x143
github.com/docker/docker/vendor/github.com/containerd/containerd.namespaceInterceptor.unary(0x55b7e25f7f49, 0x4, 0x55b7e3704fe0, 0xc000052038, 0x55b7e2651589, 0x2a, 0x55b7e354b960, 0xc00269df80, 0x55b7e354e780, 0xc0013adf40, ...)
    /go/src/github.com/docker/docker/vendor/github.com/containerd/containerd/grpc.go:35 +0xf4
github.com/docker/docker/vendor/google.golang.org/grpc.(*ClientConn).Invoke(0xc000176dc0, 0x55b7e3704fe0, 0xc000052038, 0x55b7e2651589, 0x2a, 0x55b7e354b960, 0xc00269df80, 0x55b7e354e780, 0xc0013adf40, 0x0, ...)
    /go/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:35 +0x10c
github.com/docker/docker/vendor/github.com/containerd/containerd/api/services/tasks/v1.(*tasksClient).Delete(0xc000db26a8, 0x55b7e3704fe0, 0xc000052038, 0xc00269df80, 0x0, 0x0, 0x0, 0xed4e7589b, 0x0, 0x0)
    /go/src/github.com/docker/docker/vendor/github.com/containerd/containerd/api/services/tasks/v1/tasks.pb.go:1315 +0xd4
github.com/docker/docker/vendor/github.com/containerd/containerd.(*task).Delete(0xc000a12570, 0x55b7e3704fe0, 0xc000052038, 0x0, 0x0, 0x0, 0x4, 0x55b7e373b2e0, 0xc000a12570)
    /go/src/github.com/docker/docker/vendor/github.com/containerd/containerd/task.go:312 +0x221
github.com/docker/docker/libcontainerd/remote.(*client).DeleteTask(0xc0008888a0, 0x55b7e3704fe0, 0xc000052038, 0xc000d236c0, 0x40, 0x0, 0x55b7e32203e0, 0x55b7e0ce4275, 0x196, 0xc00048200b, ...)
    /go/src/github.com/docker/docker/libcontainerd/remote/client.go:438 +0xd6
github.com/docker/docker/daemon.(*Daemon).ProcessEvent(0xc00000c5a0, 0xc001a40b80, 0x40, 0x55b7e25f7d1d, 0x4, 0xc001a40b80, 0x40, 0xc001a40bc0, 0x40, 0x3b6b, ...)
    /go/src/github.com/docker/docker/daemon/monitor.go:54 +0x253
github.com/docker/docker/libcontainerd/remote.(*client).processEvent.func1()
    /go/src/github.com/docker/docker/libcontainerd/remote/client.go:634 +0x109
github.com/docker/docker/libcontainerd/queue.(*Queue).Append.func1(0xc001a40b00, 0x0, 0xc001514480, 0xc0016fbc80, 0xc0008888e0, 0xc001a40b80, 0x40)
    /go/src/github.com/docker/docker/libcontainerd/queue/queue.go:28 +0x3a
created by github.com/docker/docker/libcontainerd/queue.(*Queue).Append
    /go/src/github.com/docker/docker/libcontainerd/queue/queue.go:24 +0x186
cpuguy83 commented 5 years ago

https://github.com/containerd/containerd/pull/3540 seems promising.

buck2202 commented 5 years ago

Still testing, but it seems like the combination of docker 19.03.3 and containerd.io 1.2.10-2 may have resolved this issue for me.

The PR @cpuguy83 referenced above (containerd/containerd#3540) was backported to containerd 1.2.8 by containerd/containerd#3561, but as far as I can tell, the download.docker.com repository for bionic wasn't updated to include this patch until 1.2.10-2 within the past week or so.

frankui commented 4 years ago

We also met the similar issue on 18.09.9 docker-ce with containerd 1.2.10 and kernel is 4.19.0-041900-generic. Not all containers but some of them will hit the hang issue.

The container process and shim process are alive as pstree shows but the docker inspect/exec/stop/kill hang when docker ps can still find the container.

docker ps can still find the container

cloud-user@hm6-opshub-aio:~$ docker info | grep -i "Version|runc|containerd|shim"
WARNING: No swap limit support
Server Version: 18.09.9
Runtimes: runc
Default Runtime: runc
containerd version: b34a5c8af56e510852c35414db4c1f4fa6172339
runc version: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
init version: fec3683
Kernel Version: 4.19.0-041900-generic
cloud-user@hm6-opshub-aio:~$
cloud-user@hm6-opshub-aio:$ docker ps | grep zookeeper-0
7d1dbb28dac3 d43be510ec44 "/custom-entrypoint.…" 7 hours ago Up 7 hours k8s_zookeeper_zookeeper-0_opshub-data_ca906dcf-c623-460a-9271-1389b09912b0_1
fb0855d4e2e6 k8s.gcr.io/pause:3.1 "/pause" 46 hours ago Up 46 hours k8s_POD_zookeeper-0_opshub-data_ca906dcf-c623-460a-9271-1389b09912b0_0

docker inspect hang

cloud-user@hm6-opshub-aio:$ docker inspect 7d1dbb28dac3
^C
cloud-user@hm6-opshub-aio:$

shim and container process still alive:

cloud-user@hm6-opshub-aio:$ ps -ef | grep 7d1dbb28dac3
cloud-u+ 5797 2133 0 00:31 pts/0 00:00:00 grep --color=auto 7d1dbb28dac3
root 20154 1640 0 Nov11 ? 00:00:03 containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/7d1dbb28dac3216b2b5cbafa5c6b9dbb4a6565e56d9bd8c6f060ba7487b55501 -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -runtime-root /var/run/docker/runtime-runc
cloud-user@hm6-opshub-aio:~$ pstree -Asp 20154
systemd(1)---containerd(1640)---containerd-shim(20154)-+-custom-entrypoi(25587)---java(17248)-+-{java}(17702)
                                                       |                                      |-{java}(18065)
                                                       |                                      |-{java}(6038)
                                                       |                                      |-{java}(14587)
                                                       |                                      |-{java}(14793)
                                                       |                                      |-{java}(15040)
                                                       |                                      |-{java}(19281)
                                                       |                                      |-{java}(19301)
                                                       |                                      |-{java}(11909)
                                                       |                                      |-{java}(12716)
                                                       |                                      |-{java}(13990)
                                                       |                                      |-{java}(14318)
                                                       |                                      |-{java}(15122)
                                                       |                                      |-{java}(15182)
                                                       |                                      |-{java}(22772)
                                                       |                                      |-{java}(26839)
                                                       |                                      |-{java}(27524)
                                                       |                                      |-{java}(18492)
                                                       |                                      |-{java}(18010)
                                                       |                                      |-{java}(18022)
                                                       |                                      |-{java}(18490)
                                                       |                                      |-{java}(20045)
                                                       |                                      |-{java}(20113)
                                                       |                                      |-{java}(9862)
                                                       |                                      |-{java}(7946)
                                                       |                                      |-{java}(18670)
                                                       |                                      |-{java}(19502)
                                                       |                                      |-{java}(19504)
                                                       |                                      |-{java}(22857)
                                                       |                                      |-{java}(22858)
                                                       |                                      |-{java}(23127)
                                                       |                                      |-{java}(23178)
                                                       |                                      |-{java}(1632)
                                                       |                                      |-{java}(1602)
                                                       |                                      |-{java}(15553)
                                                       |                                      |-{java}(19236)
                                                       |                                      |-{java}(22269)
                                                       |                                      |-{java}(503)
                                                       |                                      |-{java}(18600)
                                                       |                                      |-{java}(22767)
                                                       |                                      |-{java}(20046)
                                                       |                                      |-{java}(20061)
                                                       |                                      |-{java}(20129)
                                                       |                                      |-{java}(26671)
                                                       |                                      |-{java}(17418)
                                                       |                                      |-{java}(18604)
                                                       |                                      |-{java}(19293)
                                                       |                                      |-{java}(19302)
                                                       |                                      |-{java}(19487)
                                                       |                                      |-{java}(22024)
                                                       |                                      |-{java}(7675)
                                                       |                                      |-{java}(17292)
                                                       |                                      |-{java}(32502)
                                                       |                                      |-{java}(18581)
                                                       |                                      |-{java}(29362)
                                                       |                                      |-{java}(27718)
                                                       |                                      |-{java}(29804)
                                                       |                                      |-{java}(4248)
                                                       |                                      |-{java}(17326)
                                                       |                                      |-{java}(17339)
                                                       |                                      |-{java}(17350)
                                                       |                                      |-{java}(17449)
                                                       |                                      |-{java}(1625)
                                                       |                                      |-{java}(1638)
                                                       |                                      |-{java}(22305)
                                                       |                                      |-{java}(19299)
                                                       |                                      |-{java}(19314)
                                                       |                                      |-{java}(19363)
                                                       |                                      |-{java}(13804)
                                                       |                                      |-{java}(13848)
                                                       |                                      |-{java}(13964)
                                                       |                                      |-{java}(14855)
                                                       |                                      |-{java}(7284)
                                                       |                                      |-{java}(29541)
                                                       |                                      `-{java}(30103)
                                                       |-{containerd-shim}(20261)
                                                       |-{containerd-shim}(20385)
                                                       |-{containerd-shim}(20438)
                                                       |-{containerd-shim}(20769)
                                                       |-{containerd-shim}(21254)
                                                       |-{containerd-shim}(21839)
                                                       |-{containerd-shim}(22173)
                                                       |-{containerd-shim}(22238)
                                                       |-{containerd-shim}(30808)
                                                       `-{containerd-shim}(3440)
cloud-user@hm6-opshub-aio:~$ 
zionwu commented 4 years ago

@cpuguy83, I am using docker 18.09.8 with containerd 1.2.6. If I want to resolve this issue by upgrading containerd to 1.2.8, can I just replace the containerd binary in my machine?

hakman commented 4 years ago

@zionwu 18.09.8 comes with separate containerd package. You can just download and install latest 1.2.10 rpm/deb from https://download.docker.com/linux/.

You can also just replace the containerd binaries from the latest static package. You will need to replace all binaries except the Docker ones. https://download.docker.com/linux/static/stable/x86_64/docker-19.03.5.tgz

zionwu commented 4 years ago

@hakman, thank you for the help. I upgraded containerd to 1.2.10 but the issue still exists. Do you have any suggestions?

hakman commented 4 years ago

@zionwu My only other suggestion is to also try the latest Docker 19.03.5.

rearden-steel commented 4 years ago

Any news here? Same problem with 19.03.7 and debian 9.

cutecycle commented 4 years ago

It is happening to me, with redis:5.0-alpine. docker-compose down times out, inspect hangs and I have to restart the daemon and kill the container immediately . ps shows the container there, even after the container itself has received SIGTERM.

I thought it was a zombie reaping issue, but I'm now unsure. Entrypoint should be PID 1, right?

The best I can think of is the container is handling SIGSTOP/SIGTERM on its own (redis has its own signal handler), and Docker thinks the container has been killed, but the process itself has not been killed.

date -u; cat docker-compose.yml | grep redis | grep sha; docker exec -it api-redis ps
Wed Feb 26 16:32:13 UTC 2020
    image: "redis@sha256:ee13953704783b284c080b5b0abe4620730728054f5c19e9488d7a97ecd312c5"
PID   USER     TIME  COMMAND
    1 redis     5:47 redis-server --appendonly yes --loglevel verbose
   32 root      0:00 ps

Docker version:

Client: Docker Engine - Community
 Version:           19.03.5
 API version:       1.40
 Go version:        go1.12.12
 Git commit:        633a0ea838
 Built:             Wed Nov 13 07:29:52 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.5
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.12
  Git commit:       633a0ea838
  Built:            Wed Nov 13 07:28:22 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.10
  GitCommit:        b34a5c8af56e510852c35414db4c1f4fa6172339
 runc:
  Version:          1.0.0-rc8+dev
  GitCommit:        3e425f80a8c931f88e6d94a8c831b9d5aa481657
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Docker-compose:

docker-compose -v
docker-compose version 1.23.2, build 1110ad01

Environment (AWS EC2, AMI ami-06d51e91cea0dac8d:

# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.3 LTS
Release:    18.04
Codename:   bionic
timchenxiaoyu commented 4 years ago

cephfs down lead docker inspect hang forever ? how to slove this problem ?

pw-ai commented 4 years ago

docker-compose hanging just happened for me. If I include a Dockerfile, it will hang. The following docker-compose.yml works, (note the Dockerfile build is commented out):

version: '3.5'
services:
  tensorflow-notebook:
    container_name: tensorflow-notebook-container
    image: jupyter/tensorflow-notebook:latest
    # build:
    #  context: .
    #  dockerfile:    Dockerfile
    volumes:
      - ./data:/home/jovyan
    ports:
      - 8888:8888
    network_mode: host
    restart:  unless-stopped

However, once I change it to use Dockerfile, it hangs:

version: '3.5'
services:
  tensorflow-notebook:
    container_name: tensorflow-notebook-container
    # image: jupyter/tensorflow-notebook:latest
    build:
      context: .
      dockerfile:    Dockerfile
    volumes:
      - ./data:/home/jovyan
    ports:
      - 8888:8888
    network_mode: host
    restart:  unless-stopped

Doesn't matter what's in the Dockerfile, as long as there's a reference to Dockerfile, it seems to hang

docker-compose version 1.26.0, build d4451659 Docker version 19.03.11, build 42e35e61f3 Ubuntu 18.04

Already tried restarting machine, restarting dockerd. Maybe the problem has been a while, I dont' know. This is the only container that I needed to use a Dockerfile, the others have been directly from images. I had to rebuild (haven't done that for at least a couple of months) after changing the Dockerfile: docker-compose down --rmi all -v docker-compose up -d

thaJeztah commented 4 years ago

Doesn't matter what's in the Dockerfile, as long as there's a reference to Dockerfile, it seems to hang

I see you also have a data directory in your current location. When doing a docker build, docker will send the build-context to the daemon, which means that (unless excluded by a .dockerignore file) all files and directories in the build-context will have to be transferred (even if not actually needed by your Dockerfile); if you have lots of files in that path, this could cause a delay (and lots of data to be sent)

baaart commented 3 years ago

I have same issue on docker 19.03.13, docker for mac, almost fresh install of the system. It was working normally for a month, and suddenly started to hang on every docker-compose command. After investigation it seems like it hangs on docker inspect while doing compose What I've tried to fix this:

  1. Restart docker
  2. Update docker for mac
  3. restart system

None of the above helped in any way. Still everything is hanging. Not sure if i should post it in docker for mac as i have exactly same issues as here so does not looks like system dependent. Does anyone have any working sollution for this? Should i uninstall and downgrade? I need working docker for work.

ten-lac commented 3 years ago

I'm running into the same set of symptoms as described by various individuals in this thread. This is my setup.

Client:
 Version:           19.03.6-ce
 API version:       1.40
 Go version:        go1.13.4
 Git commit:        369ce74
 Built:             Fri May 29 04:01:26 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          19.03.6-ce
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.4
  Git commit:       369ce74
  Built:            Fri May 29 04:01:57 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.1
  GitCommit:        c623d1b36f09f8ef6536a057bd658b3aa8632828
 runc:
  Version:          1.0.0-rc92
  GitCommit:        ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

The errors I see.

Dec 18 18:30:08 ip-10-15-125-14.ec2.internal dockerd[3989]: time="2020-12-18T18:30:08.048544559Z" level=error msg="ebf3404977b8235dd09ff9ace3dacdd34c1471fa957907ce49047cb7ee6c01a8 cleanup: failed to delete container from containerd: no such container"
Dec 18 18:30:08 ip-10-15-125-14.ec2.internal dockerd[3989]: time="2020-12-18T18:30:08.048583023Z" level=error msg="Handler for POST /containers/ebf3404977b8235dd09ff9ace3dacdd34c1471fa957907ce49047cb7ee6c01a8/start returned error: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: read init-p: connection reset by peer: unknown"

and

Dec 21 14:00:08 ip-10-15-125-14.ec2.internal dockerd[3989]: time="2020-12-21T14:00:08.109472004Z" level=error msg="a7e3f489bb900db0b6601919481bed2a3800984e204d49263c0c75397a25d47a cleanup: failed to delete container from containerd: no such container"
Dec 21 14:00:08 ip-10-15-125-14.ec2.internal dockerd[3989]: time="2020-12-21T14:00:08.109509720Z" level=error msg="Handler for POST /containers/a7e3f489bb900db0b6601919481bed2a3800984e204d49263c0c75397a25d47a/start returned error: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:338: getting the final child's pid from pipe caused: EOF: unknown"

Both of those errors resulted from running a K8s Job. The workload container finishes. But the pause container is stuck in CREATED state.

chazwarp923 commented 1 year ago

Also experiencing this issue. Ive had it happen intermittently in the past, but it is extremely pervasive this time.

Client: Docker Engine - Community Version: 24.0.4 API version: 1.43 Go version: go1.20.5 Git commit: 3713ee1 Built: Fri Jul 7 14:50:55 2023 OS/Arch: linux/amd64 Context: default

Server: Docker Engine - Community Engine: Version: 24.0.4 API version: 1.43 (minimum version 1.12) Go version: go1.20.5 Git commit: 4ffc614 Built: Fri Jul 7 14:50:55 2023 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.6.21 GitCommit: 3dce8eb055cbb6872793272b4f20ed16117344f8 runc: Version: 1.1.7 GitCommit: v1.1.7-0-g860f061 docker-init: Version: 0.19.0 GitCommit: de40ad0