Open Routhinator opened 6 years ago
Can you grab a stack trace from the daemon?
curl --unix-socket /var/run/docker.sock http://./debug/pprof/goroutine?debug=2
I have the same problem, here is the stack trace https://pastebin.com/raw/pBNxVRaR
it's getting quite painful and it seems to happen more and more often (hence why I'm starting to look into this)
this is also for a container created by docker-compose, in my case the container did not get started:
docker ps -a -f name=phpunit CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES ced5781314bc v1_phpunit "sh" About an hour ago Created v1_phpunit_run_1
and since docker-compose tries to reuse the existing container this service is locked until docker is restarted (which often requires a computer reboot)
@mathroc Thanks!
Looks like this is stuck waiting on containerd for some reason. Can you pull the stack trace from containerd as well?
Should be able to get it with (assuming you are you using docker to manage containerd):
docker-containerd-ctr pprof --debug-socket /run/docker/containerd/docker-containerd-debug.sock goroutines
@cpuguy83 I have one right now but I don't have the docker-containerd-ctr
binary, is there an alternative to that ?
(I installed docker-ce and containerd via portage on gentoo)
Similar problem here, but using swarm and not compose.
I have a swarm service that is not listed by docker service ls
, but the container is listed by docker ps
. Trying to inspect
it hangs indefinitely.
4144e40b25f5 docker.example.com:5000/admin:2219 "/bin/bash /admin-do…" 6 days ago Up 6 days (unhealthy) admin-service.osn1h8ctn3xeoacs9k42dxzi7.jmdgzie4mnvy475sxytmj5fse
$ docker info
Containers: 11
Running: 11
Paused: 0
Stopped: 0
Images: 13
Server Version: 18.06.0-ce
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
NodeID: osn1h8ctn3xeoacs9k42dxzi7
Is Manager: true
ClusterID: nzrxgungmmv3qr2ftg24htxg4
Managers: 1
Nodes: 1
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 10
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Autolock Managers: false
Root Rotation In Progress: false
Node Address: 10.64.225.104
Manager Addresses:
10.64.225.104:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: d64c661f1d51c48782c9cec8fda7604785f93587
runc version: 69663f0bd4b60df09991c08812a60108003fa340
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 3.10.0-862.3.3.el7.x86_64
Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.65GiB
Name: [hostname]
ID: WPK6:BMJC:ADNC:NUNT:VBYK:2VFV:BGLR:F4QV:5RFR:BNJ6:5KLY:N2A7
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
$ curl --unix-socket /var/run/docker.sock http://./debug/pprof/goroutine?debug=2
https://gist.githubusercontent.com/halcyon22/0470948eccb9dd285098e41d1ced4e1a/raw/315dc7863f20ac04ccdce26dc41a29f93cf5a392/docker_pprof.out
$ docker-containerd-ctr pprof --debug-socket /run/docker/containerd/docker-containerd-debug.sock goroutines
https://gist.githubusercontent.com/halcyon22/b265cc89ed3fdabb11b1951169c728df/raw/90c00bc52124b7128963270f07dafd8b00a40c22/containerd_pprof.out
Hello, we have the same problem here, this is a year old version of docker but it might give some extra clues.
One interesting thing in my setup, dockerd has been up for 5 weeks and the containers up for 3-5 week. We have 8 running containers, 3 of which are currently hanging on docker inspect, the others are fine. the 3 containers are running python behind wsgi and some of the fine containers have the same setup.
Here's out info: Docker version
Client:
Version: 17.12.1-ce
API version: 1.35
Go version: go1.9.4
Git commit: 7390fc6
Built: Tue Feb 27 22:17:40 2018
OS/Arch: linux/amd64
Server:
Engine:
Version: 17.12.1-ce
API version: 1.35 (minimum version 1.12)
Go version: go1.9.4
Git commit: 7390fc6
Built: Tue Feb 27 22:16:13 2018
OS/Arch: linux/amd64
Experimental: false
docker info
Containers: 9
Running: 9
Paused: 0
Stopped: 0
Images: 8
Server Version: 17.12.1-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9b55aab90508bd389d7654c4baf173a981477d55
runc version: 9f9c96235cc97674e935002fc3d78361b696a69e
init version: 949e6fa
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.4.0-127-generic
Operating System: Ubuntu 16.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.906GiB
Name: node02
ID: MLF2:BNI4:PH4C:U5K5:F3JD:SY26:QQ75:ULCL:LIVN:4WA5:ZU3U:CVZI
Docker Root Dir: /data/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
containerd ppof https://gist.githubusercontent.com/lindycoder/9dd963bcaee8682d13e43544fcaaee25/raw/4799f77f203dfb13916218e049dc903c66616b75/containerd_pprof.out
Since the container hangs when inspecting but are still running fine, i might let them like this for a while so if you have any extra information i could provide to help find a solution please tell me!
in case it helps here is curl --unix-socket /var/run/docker.sock http://./debug/pprof/goroutine?debug=2 > docker_pprof.out
: https://pastebin.com/raw/sMN1fxZe
@halcyon22
This looks like it's stuck waiting on a kill from the containerd shim (the process that manages your container process).
ping @crosbymichael @dmcgowan on this one.
@mathroc Any chance you can get the containerd output as well? I'm not sure how docker is packaged on Gentoo... maybe just a ctr
binary is available, in which case sockets and such may be in different locations.
@lindycoder There are some known deadlocks in containerd and docker in 17.12, can you upgrade to a newer version? 18.06 is the most recent release.
@cpuguy83 you're right there's a ctr
bianry, here is the output : https://pastebin.com/raw/FpGwmfq7
Thanks @cpuguy83 - with that information I stopped the Docker service, killed containerd, brought the service back up again, and the problem didn't reappear. I thought I had rebooted the server prior to adding my comment, but maybe that hadn't worked.
I'm seeing the same issue here. Right now I'm trying to update all systems to the most resent version and check if the issue is resolved ...
Same problem on up to date docker version. As a possible solution you can locate process related to problem container by it's id and kill them
Client:
Version: 18.06.1-ce
API version: 1.38
Go version: go1.10.3
Git commit: e68fc7a
Built: Tue Aug 21 17:24:58 2018
OS/Arch: linux/amd64
Experimental: false
Server:
Engine:
Version: 18.06.1-ce
API version: 1.38 (minimum version 1.12)
Go version: go1.10.3
Git commit: e68fc7a
Built: Tue Aug 21 17:23:24 2018
OS/Arch: linux/amd64
Experimental: false
Containers: 10
Running: 7
Paused: 0
Stopped: 3
Images: 82
Server Version: 18.06.1-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: nvidia runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e
runc version: 69663f0bd4b60df09991c08812a60108003fa340
init version: fec3683
Security Options:
apparmor
Kernel Version: 4.4.0-141-generic
Operating System: Ubuntu 14.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 62.78GiB
Name: *
ID: I6UV:RJTS:L2XD:3QRG:OW7C:HVBG:34GL:IBFP:LXE7:XX4Q:3W6A:JZAT
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
We experienced the exact same issue with certain containers of ours. Same issue with various docker versions from version 18.03 to 18.09.2 and various kernels from CentOS 7.2 to 7.6. We ended up finding out that the issue for us only occurred with containers that utilized NFS type docker volumes, and our NFS daemon somehow got into a funky state where mounts would hang indefinitely. Restarting the NFS daemon fixed our issue and mounts succeeded again. Probably docker needs to have a timeout on NFS mount attempts and allow users to do things like delete/inspect containers and remove docker volumes for misbehaving NFS mounts.
Would be curious to know if other people on this thread experiencing this issue are using NFS volumes as well, or if there are separate root causes for this issue.
We got this problem on 18.06.1-ce, no NFS used, using docker-compose
We're also having this problem, 18.06.3-ce
, using Nomad 0.8.7 on Ubuntu 14.04. Two instances exhibited this problem within minutes of one another a week ago, and another today.
dockerd-pprof.log docker-containerd-ctr.log
docker ps
shows
root@nomad-client-blue-10-181-2-40:~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
59ef6eb37a06 8df826da855b "/usr/bin/init.sh bu…" 29 minutes ago Up 29 minutes 10.181.2.40:22386->9292/tcp, 10.181.2.40:22386->9292/udp frontend-9fd92404-4aaf-0dd1-26e9-84c74e5e3ddd
b97162768a9f 1b6c25ab9e92 "/usr/bin/init.sh bu…" 11 hours ago Up 11 hours 10.181.2.40:25345->9292/tcp, 10.181.2.40:25345->9292/udp frontend-b094a318-4ce3-4f37-bb63-72be91cedb0f
and the process tree shows:
1055 ? Ssl 116:04 /usr/bin/dockerd --raw-logs
1978 ? Ssl 196:19 \_ docker-containerd --config /var/run/docker/containerd/containerd.toml
12136 ? Sl 0:01 | \_ docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/b97162768a9f54d0a289cd50
11069 ? Sl 0:00 | \_ docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/59ef6eb37a0661c4386f66a0
11087 ? Ssl 0:00 | \_ puma 3.12.0 (tcp://0.0.0.0:9292) [app]
11153 ? Sl 3:41 | \_ …
12117 ? Sl 0:00 \_ /usr/bin/docker-proxy -proto tcp -host-ip 10.181.2.40 -host-port 25345 -container-ip 172.17.0.2 -container-port 9292
12130 ? Sl 0:00 \_ /usr/bin/docker-proxy -proto udp -host-ip 10.181.2.40 -host-port 25345 -container-ip 172.17.0.2 -container-port 9292
11047 ? Sl 0:00 \_ /usr/bin/docker-proxy -proto tcp -host-ip 10.181.2.40 -host-port 22386 -container-ip 172.17.0.5 -container-port 9292
11061 ? Sl 0:00 \_ /usr/bin/docker-proxy -proto udp -host-ip 10.181.2.40 -host-port 22386 -container-ip 172.17.0.5 -container-port 9292
Note that docker-containerd-shim
for b97162768a9f
has no child processes. I can docker inspect 59ef6eb37a06
, but docker inspect b97162768a9f
hangs.
A simple kill 12136
(of the docker-containerd-shim
process) doesn't work, I have to kill -9
. Even afterwards, docker inspect b97162768a9f
still hangs. Only after I stop and start the docker service am I able to inspect both containers.
The output from docker-containerd-ctr
shows a couple of goroutines that have been alive from the time the "stuck" container was asked to stop:
goroutine 36642793 [select, 26 minutes]:
github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*Client).run(0xc4205569c0)
/go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.go:171 +0x309
created by github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.NewClient
/go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.go:46 +0x147
goroutine 36642794 [IO wait, 26 minutes]:
internal/poll.runtime_pollWait(0x7fb457e2c950, 0x72, 0xc42064cb80)
/usr/local/go/src/runtime/netpoll.go:173 +0x59
internal/poll.(*pollDesc).wait(0xc4200eab18, 0x72, 0xffffffffffffff00, 0x558dcff4ef20, 0x558dd0615a98)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0x9d
internal/poll.(*pollDesc).waitRead(0xc4200eab18, 0xc420df3000, 0x1000, 0x1000)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3f
internal/poll.(*FD).Read(0xc4200eab00, 0xc420df3000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/usr/local/go/src/internal/poll/fd_unix.go:157 +0x17f
net.(*netFD).Read(0xc4200eab00, 0xc420df3000, 0x1000, 0x1000, 0xc42064cc68, 0x558dcec21010, 0xc420502a80)
/usr/local/go/src/net/fd_unix.go:202 +0x51
net.(*conn).Read(0xc4209141b0, 0xc420df3000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/usr/local/go/src/net/net.go:176 +0x6c
bufio.(*Reader).Read(0xc420556720, 0xc4209129e0, 0xa, 0xa, 0x0, 0x1, 0x0)
/usr/local/go/src/bufio/bufio.go:216 +0x23a
io.ReadAtLeast(0x558dcff4ba60, 0xc420556720, 0xc4209129e0, 0xa, 0xa, 0xa, 0xc420425a00, 0x0, 0xc420870b40)
/usr/local/go/src/io/io.go:309 +0x88
io.ReadFull(0x558dcff4ba60, 0xc420556720, 0xc4209129e0, 0xa, 0xa, 0xc42064cf48, 0x2, 0x2)
/usr/local/go/src/io/io.go:327 +0x5a
github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.readMessageHeader(0xc4209129e0, 0xa, 0xa, 0x558dcff4ba60, 0xc420556720, 0xc42064cf48, 0x2, 0x2, 0xc42064cfa8)
/go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/channel.go:38 +0x62
github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*channel).recv(0xc4209129c0, 0x558dcff65b40, 0xc420040080, 0x0, 0xc420590002, 0xc4206ea1c0, 0x1b8, 0x1b8, 0x0, 0x0)
/go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/channel.go:86 +0x6f
github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*Client).run.func1(0xc420870ba0, 0xc4205569c0, 0xc4208da170, 0xc420870b40)
/go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.go:144 +0x90
created by github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*Client).run
/go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.go:139 +0x13f
goroutine 36642792 [syscall, 26 minutes]:
syscall.Syscall6(0xf7, 0x1, 0x2b3d, 0xc4202afde0, 0x1000004, 0x0, 0x0, 0x558dcf06504a, 0xc4200e22c0, 0x558dcff6cea0)
/usr/local/go/src/syscall/asm_linux_amd64.s:44 +0x5
os.(*Process).blockUntilWaitable(0xc420b08510, 0x558dcfd29be0, 0xc4208da030, 0x0)
/usr/local/go/src/os/wait_waitid.go:31 +0x9a
os.(*Process).wait(0xc420b08510, 0x558dcff65bc0, 0xc420c7b350, 0x0)
/usr/local/go/src/os/exec_unix.go:22 +0x3e
os.(*Process).Wait(0xc420b08510, 0x48, 0xc4202aff70, 0x558dcebf37fb)
/usr/local/go/src/os/exec.go:123 +0x2d
os/exec.(*Cmd).Wait(0xc4205f8840, 0x558dcec25191, 0xc4202d4210)
/usr/local/go/src/os/exec/exec.go:461 +0x5e
github.com/containerd/containerd/linux/shim/client.WithStart.func1.2(0xc4205f8840, 0xc420e2f220)
/go/src/github.com/containerd/containerd/linux/shim/client/client.go:78 +0x2d
created by github.com/containerd/containerd/linux/shim/client.WithStart.func1
/go/src/github.com/containerd/containerd/linux/shim/client/client.go:77 +0x307
goroutine 34853989 [select, 26 minutes]:
github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*Client).run(0xc4204e7320)
/go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.go:171 +0x309
created by github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.NewClient
/go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.go:46 +0x147
I believe we are seeing this as well. It seems to be tied to situations where containers are getting oomkilled or otherwise are in restart loops
``` goroutine 1374264 [select, 689 minutes]: github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*Stream).waitOnHeader(0xc42a1f42d0, 0x10, 0xc423b152f0) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.go:222 +0x101 github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*Stream).RecvCompress(0xc42a1f42d0, 0x555558c3fbb0, 0xc423b153b0) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.go:233 +0x2d github.com/docker/docker/vendor/google.golang.org/grpc.(*csAttempt).recvMsg(0xc42a1f8340, 0x555558aa9b20, 0xc4247840c8, 0x0, 0x0) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:515 +0x63b github.com/docker/docker/vendor/google.golang.org/grpc.(*clientStream).RecvMsg(0xc42a1f6280, 0x555558aa9b20, 0xc4247840c8, 0x0, 0x0) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:395 +0x45 github.com/docker/docker/vendor/google.golang.org/grpc.invoke(0x555558c74680, 0xc42a5c70e0, 0x555558074062, 0x27, 0x555558ab4da0, 0xc42c811220, 0x555558aa9b20, 0xc4247840c8, 0xc420648300, 0xc4208f4900, ...) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:83 +0x185 github.com/docker/docker/vendor/github.com/containerd/containerd.namespaceInterceptor.unary(0x55555802962e, 0x4, 0x555558c74600, 0xc42003e018, 0x555558074062, 0x27, 0x555558ab4da0, 0xc42c811220, 0x555558aa9b20, 0xc4247840c8, ...) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/grpc.go:35 +0xf6 github.com/docker/docker/vendor/github.com/containerd/containerd.(namespaceInterceptor).(github.com/docker/docker/vendor/github.com/containerd/containerd.unary)-fm(0x555558c74600, 0xc42003e018, 0x555558074062, 0x27, 0x555558ab4da0, 0xc42c811220, 0x555558aa9b20, 0xc4247840c8, 0xc420648300, 0x555558c3fc20, ...) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/grpc.go:51 +0xf6 github.com/docker/docker/vendor/google.golang.org/grpc.(*ClientConn).Invoke(0xc420648300, 0x555558c74600, 0xc42003e018, 0x555558074062, 0x27, 0x555558ab4da0, 0xc42c811220, 0x555558aa9b20, 0xc4247840c8, 0x0, ...) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:35 +0x10b github.com/docker/docker/vendor/google.golang.org/grpc.Invoke(0x555558c74600, 0xc42003e018, 0x555558074062, 0x27, 0x555558ab4da0, 0xc42c811220, 0x555558aa9b20, 0xc4247840c8, 0xc420648300, 0x0, ...) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:60 +0xc3 github.com/docker/docker/vendor/github.com/containerd/containerd/api/services/tasks/v1.(*tasksClient).Get(0xc4247840c0, 0x555558c74600, 0xc42003e018, 0xc42c811220, 0x0, 0x0, 0x0, 0xc426ebf7b8, 0x555556dde1ff, 0x555558c74680) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/api/services/tasks/v1/tasks.pb.go:448 +0xd4 github.com/docker/docker/vendor/github.com/containerd/containerd.(*process).Status(0xc42453a510, 0x555558c74600, 0xc42003e018, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x40, ...) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/process.go:220 +0x119 github.com/docker/docker/vendor/github.com/containerd/containerd.(*process).Delete(0xc42453a510, 0x555558c74600, 0xc42003e018, 0x0, 0x0, 0x0, 0x40, 0xc42c097680, 0x40) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/process.go:196 +0x119 github.com/docker/docker/libcontainerd.(*client).processEvent.func1() /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/libcontainerd/client_daemon.go:713 +0x212 github.com/docker/docker/libcontainerd.(*queue).append.func1(0xc42c097601, 0xc42a7b0420, 0xc421f6ba00, 0xc42a6483c0, 0xc4202f46e8, 0xc42c097640, 0x40) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/libcontainerd/queue.go:26 +0x3a created by github.com/docker/docker/libcontainerd.(*queue).append /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/libcontainerd/queue.go:22 +0x196 ... goroutine 1377768 [select, 688 minutes]: github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*Stream).waitOnHeader(0xc426a2ec30, 0x10, 0xc42a3b2bd0) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.go:222 +0x101 github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*Stream).RecvCompress(0xc426a2ec30, 0x555558c3fbb0, 0xc42a3b2c90) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.go:233 +0x2d github.com/docker/docker/vendor/google.golang.org/grpc.(*csAttempt).recvMsg(0xc42247f380, 0x555558b22200, 0x55555986b588, 0x0, 0x0) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:515 +0x63b github.com/docker/docker/vendor/google.golang.org/grpc.(*clientStream).RecvMsg(0xc420408300, 0x555558b22200, 0x55555986b588, 0x0, 0x0) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:395 +0x45 github.com/docker/docker/vendor/google.golang.org/grpc.invoke(0x555558c74680, 0xc426217a40, 0x555558076915, 0x28, 0x555558ab5f20, 0xc428567140, 0x555558b22200, 0x55555986b588, 0xc420648300, 0xc4208f4900, ...) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:83 +0x185 github.com/docker/docker/vendor/github.com/containerd/containerd.namespaceInterceptor.unary(0x55555802962e, 0x4, 0x555558c74600, 0xc42003e018, 0x555558076915, 0x28, 0x555558ab5f20, 0xc428567140, 0x555558b22200, 0x55555986b588, ...) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/grpc.go:35 +0xf6 github.com/docker/docker/vendor/github.com/containerd/containerd.(namespaceInterceptor).(github.com/docker/docker/vendor/github.com/containerd/containerd.unary)-fm(0x555558c74600, 0xc42003e018, 0x555558076915, 0x28, 0x555558ab5f20, 0xc428567140, 0x555558b22200, 0x55555986b588, 0xc420648300, 0x555558c3fc20, ...) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/grpc.go:51 +0xf6 github.com/docker/docker/vendor/google.golang.org/grpc.(*ClientConn).Invoke(0xc420648300, 0x555558c74600, 0xc42003e018, 0x555558076915, 0x28, 0x555558ab5f20, 0xc428567140, 0x555558b22200, 0x55555986b588, 0x0, ...) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:35 +0x10b github.com/docker/docker/vendor/google.golang.org/grpc.Invoke(0x555558c74600, 0xc42003e018, 0x555558076915, 0x28, 0x555558ab5f20, 0xc428567140, 0x555558b22200, 0x55555986b588, 0xc420648300, 0x0, ...) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:60 +0xc3 github.com/docker/docker/vendor/github.com/containerd/containerd/api/services/tasks/v1.(*tasksClient).Exec(0xc4285e0ae8, 0x555558c74600, 0xc42003e018, 0xc428567140, 0x0, 0x0, 0x0, 0xa3, 0xc428566f60, 0x0) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/api/services/tasks/v1/tasks.pb.go:475 +0xd4 github.com/docker/docker/vendor/github.com/containerd/containerd.(*task).Exec(0xc423993e90, 0x555558c74600, 0xc42003e018, 0xc428dc2bc0, 0x40, 0xc42247e8f0, 0xc428566f60, 0x0, 0x0, 0x0, ...) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/task.go:333 +0x2ba github.com/docker/docker/libcontainerd.(*client).Exec(0xc4202f4690, 0x555558c74600, 0xc42003e018, 0xc421cdef40, 0x40, 0xc428dc2bc0, 0x40, 0xc42247e8f0, 0x0, 0xc4234531f0, ...) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/libcontainerd/client_daemon.go:367 +0x3f1 github.com/docker/docker/daemon.(*Daemon).ContainerExecStart(0xc4204828c0, 0x555558c74600, 0xc42003e018, 0xc4228250f1, 0x40, 0x7ffff7f0f370, 0xc42093f0c0, 0x555558c510a0, 0xc42093f280, 0x555558c510a0, ...) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/exec.go:251 +0xb4f github.com/docker/docker/api/server/router/container.(*containerRouter).postContainerExecStart(0xc420d71fc0, 0x555558c74680, 0xc425c77080, 0x555558c72b40, 0xc4272fc0e0, 0xc423bdad00, 0xc425c76fc0, 0x0, 0x0) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/exec.go:125 +0x34b github.com/docker/docker/api/server/router/container.(*containerRouter).(github.com/docker/docker/api/server/router/container.postContainerExecStart)-fm(0x555558c74680, 0xc425c77080, 0x555558c72b40, 0xc4272fc0e0, 0xc423bdad00, 0xc425c76fc0, 0x555556bcee3c, 0x555558aba440) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/container.go:59 +0x6b github.com/docker/docker/api/server/middleware.ExperimentalMiddleware.WrapHandler.func1(0x555558c74680, 0xc425c77080, 0x555558c72b40, 0xc4272fc0e0, 0xc423bdad00, 0xc425c76fc0, 0x555558c74680, 0xc425c77080) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/experimental.go:26 +0xda github.com/docker/docker/api/server/middleware.VersionMiddleware.WrapHandler.func1(0x555558c74680, 0xc425c77050, 0x555558c72b40, 0xc4272fc0e0, 0xc423bdad00, 0xc425c76fc0, 0x0, 0xc42b14ba58) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/version.go:62 +0x401 github.com/docker/docker/pkg/authorization.(*Middleware).WrapHandler.func1(0x555558c74680, 0xc425c77050, 0x555558c72b40, 0xc4272fc0e0, 0xc423bdad00, 0xc425c76fc0, 0x555558c74680, 0xc425c77050) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/pkg/authorization/middleware.go:59 +0x7ab github.com/docker/docker/api/server.(*Server).makeHTTPHandler.func1(0x555558c72b40, 0xc4272fc0e0, 0xc423bdad00) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/server.go:141 +0x19a net/http.HandlerFunc.ServeHTTP(0xc4212fed40, 0x555558c72b40, 0xc4272fc0e0, 0xc423bdad00) /usr/local/go/src/net/http/server.go:1947 +0x46 github.com/docker/docker/vendor/github.com/gorilla/mux.(*Router).ServeHTTP(0xc4217185f0, 0x555558c72b40, 0xc4272fc0e0, 0xc423bdad00) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/gorilla/mux/mux.go:103 +0x228 github.com/docker/docker/api/server.(*routerSwapper).ServeHTTP(0xc420edc910, 0x555558c72b40, 0xc4272fc0e0, 0xc423bdad00) /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router_swapper.go:29 +0x72 net/http.serverHandler.ServeHTTP(0xc4204365b0, 0x555558c72b40, 0xc4272fc0e0, 0xc423bdad00) /usr/local/go/src/net/http/server.go:2694 +0xbe net/http.(*conn).serve(0xc420b67ea0, 0x555558c745c0, 0xc426d72a40) /usr/local/go/src/net/http/server.go:1830 +0x653 created by net/http.(*Server).Serve /usr/local/go/src/net/http/server.go:2795 +0x27d ```
``` Containers: 206 Running: 165 Paused: 0 Stopped: 41 Images: 112 Server Version: 18.06.2-ce Storage Driver: overlay2 Backing Filesystem: xfs Supports d_type: true Native Overlay Diff: true Logging Driver: json-file Cgroup Driver: systemd Plugins: Volume: local Network: bridge host macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog Swarm: inactive Runtimes: runc Default Runtime: runc Init Binary: docker-init containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e runc version: 69663f0bd4b60df09991c08812a60108003fa340 init version: fec3683 Security Options: seccomp Profile: default Kernel Version: 3.10.0-862.6.3.el7.x86_64 Operating System: CentOS Linux 7 (Core) OSType: linux Architecture: x86_64 CPUs: 56 Total Memory: 503.6GiB Docker Root Dir: /mongodb/docker Debug Mode (client): false Debug Mode (server): false Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false ```
Seem like the same issue @cpuguy83? any further guidance?
This seems closely related to https://github.com/moby/moby/issues/38064
docker 18.09-ce also have this problem
Seems like the bug is also present in 19.03.1
~# docker version
Client: Docker Engine - Community
Version: 19.03.1
API version: 1.40
Go version: go1.12.5
Git commit: 74b1e89
Built: Thu Jul 25 21:22:03 2019
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.1
API version: 1.40 (minimum version 1.12)
Go version: go1.12.5
Git commit: 74b1e89
Built: Thu Jul 25 21:20:35 2019
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.2.6
GitCommit: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
runc:
Version: 1.0.0-rc8
GitCommit: 425e105d5a03fabd737a126ad93d62a9eeede87f
docker-init:
Version: 0.18.0
GitCommit: fec3683
Stack trace: https://plik.himbeere.pw/file/yw4XSErMAEMxG9wD/wUQJrxyVYWsRGzhp/stacktrace
Example:
# time timeout 10 docker rm -f dev_admin_php-fpm_1
real 0m10.004s
user 0m0.011s
sys 0m0.024s
# time timeout 10 docker inspect dev_admin_php-fpm_1
real 0m10.008s
user 0m0.013s
sys 0m0.024s
Restarting the docker daemon fixed it.
I can confirm that this issue is present on 19.03.1 on Ubuntu 18.04.2 LTS.
Does anyone have a workaround that only impacts the offending container/service? I have only been able to fix this issue by rebooting the server, as the system hangs when trying to restart the docker daemon (sudo systemctl restart docker
, sudo service docker start
).
This is naturally not very good for the uptime of my services :(
The machine should have plenty resources, memory is around 5-10% and system load is usually way low as well.
Happening with extremely high frequency to me on multiple ubuntu machines running 19.03.1. docker inspect
is an essential part of the scripts that manage my short-lived containers, so this is a breaking issue for me. My last heavy use of docker was on 18.09.x, and I never saw this issue across tens of thousands of containers and many machines.
No NFS, not using docker compose, and the containers that inspect hangs on were not otherwise misbehaving.
edit to add: In an attempt to work around the issue, I made a shell script with permissions to read the docker/containers folder and manually pull fields that I need from the config.v2.json file. Somewhat worried about reading it while it's in the process of being rewritten or edited, but that's the only way I could think to eliminate calls to inspect. But...it turns out that I have other breaking issues with 19.03.1 and have reverted to 18.09
@luminouspanda if you can put your stack trace in a site not flagged as a security issue, such as gist, then we can review it.
Everyone else experiencing problems on 19.03, there seems to be a regression (At least impirically based on number of reports I've seen). Can you please post the stack trace from effected systems?
curl --unix-socket /var/run/docker.sock http://./debug/pprof/goroutine?debug=2
@cpuguy83 : not sure if this will be useful, since like I said, I have other issues with 19.03 (will create a new bug report...but, in short, docker never registers that a container has exited after its referenced PID quits). But, including anyway this in the hope that it will be useful
docker version (experimental is enabled because I use checkpoints):
Client: Docker Engine - Community
Version: 19.03.1
API version: 1.40
Go version: go1.12.5
Git commit: 74b1e89
Built: Thu Jul 25 21:21:05 2019
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.1
API version: 1.40 (minimum version 1.12)
Go version: go1.12.5
Git commit: 74b1e89
Built: Thu Jul 25 21:19:41 2019
OS/Arch: linux/amd64
Experimental: true
containerd:
Version: 1.2.6
GitCommit: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
runc:
Version: 1.0.0-rc8
GitCommit: 425e105d5a03fabd737a126ad93d62a9eeede87f
docker-init:
Version: 0.18.0
GitCommit: fec3683
docker info:
Client:
Debug Mode: false
Server:
Containers: 7
Running: 7
Paused: 0
Stopped: 0
Images: 10
Server Version: 19.03.1
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
runc version: 425e105d5a03fabd737a126ad93d62a9eeede87f
init version: fec3683
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 5.0.0-25-generic
Operating System: Linux Mint 19
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 15.37GiB
Name: thinkpad
ID: W7ON:HPJN:2SGV:LS7L:VBOZ:RTGK:JN46:HTF7:N4KW:H4PU:24AH:E3OK
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: true
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: true
WARNING: No swap limit support
output of curl --unix-socket /var/run/docker.sock http://./debug/pprof/goroutine?debug=2
with a hung docker container inspect
:
https://gist.github.com/buck2202/8058fda098f992c4a35bc181f27bcbf3
The issue seems more correlated with the time of day than anything else, the last three days I've seen it around 3:30 pm but that must surely be a coincidence?
Anyway, here are the output of curl --unix-socket /var/run/docker.sock http://./debug/pprof/goroutine?debug=2
:
https://gist.github.com/glaux/6562f669885929dcb26ef668d2fcb71c
This time I had run composer require
in a php container when I got an error about memory exhaustion (I have only a few containers running on a small dev server):
Fatal error: Allowed memory size of 1610612736 bytes exhausted (tried to allocate 4096 bytes) in phar:///usr/local/bin/composer/src/Composer/DependencyResolver/Solver.php on line 223
Check https://getcomposer.org/doc/articles/troubleshooting.md#memory-limit-errors for more info on how to handle out of memory errors.
When running docker-compose up
I get the message:
ERROR: An HTTP request took too long to complete. Retry with --verbose to obtain debug information.
If you encounter this issue regularly because of slow network conditions, consider setting COMPOSE_HTTP_TIMEOUT to a higher value (current value: 60).
That particular error is discussed here: https://github.com/docker/compose/issues/5620 which links to this thread.
Docker version:
$ docker version
Client: Docker Engine - Community
Version: 19.03.1
API version: 1.40
Go version: go1.12.5
Git commit: 74b1e89
Built: Thu Jul 25 21:21:05 2019
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.1
API version: 1.40 (minimum version 1.12)
Go version: go1.12.5
Git commit: 74b1e89
Built: Thu Jul 25 21:19:41 2019
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.2.6
GitCommit: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
runc:
Version: 1.0.0-rc8
GitCommit: 425e105d5a03fabd737a126ad93d62a9eeede87f
docker-init:
Version: 0.18.0
GitCommit: fec3683
This looks like a deadlock waiting on data back from containerd. I know I've seen some things come across regarding this.
Unfortunately the debug socket for containerd is no longer available from a default docker install so tracking down the cause is a bit harder :(
goroutine 562400 [select, 3 minutes]:
github.com/docker/docker/vendor/google.golang.org/grpc/internal/transport.(*Stream).waitOnHeader(0xc001a08000, 0x8, 0xc00134b880)
/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/internal/transport/transport.go:267 +0xce
github.com/docker/docker/vendor/google.golang.org/grpc/internal/transport.(*Stream).RecvCompress(...)
/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/internal/transport/transport.go:278
github.com/docker/docker/vendor/google.golang.org/grpc.(*csAttempt).recvMsg(0xc00134b880, 0x55b7e354e780, 0xc0013adf40, 0x0, 0xc00258cae0, 0x0)
/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:856 +0x70e
github.com/docker/docker/vendor/google.golang.org/grpc.(*clientStream).RecvMsg.func1(0xc00134b880, 0x0, 0x0)
/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:721 +0x48
github.com/docker/docker/vendor/google.golang.org/grpc.(*clientStream).withRetry(0xc000e73320, 0xc00258cae8, 0xc00258cad8, 0xc001a87810, 0x0)
/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:577 +0x7e
github.com/docker/docker/vendor/google.golang.org/grpc.(*clientStream).RecvMsg(0xc000e73320, 0x55b7e354e780, 0xc0013adf40, 0x0, 0x0)
/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.go:720 +0x102
github.com/docker/docker/vendor/google.golang.org/grpc.invoke(0x55b7e3705060, 0xc002cc0030, 0x55b7e2651589, 0x2a, 0x55b7e354b960, 0xc00269df80, 0x55b7e354e780, 0xc0013adf40, 0xc000176dc0, 0xc000891520, ...)
/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:73 +0x143
github.com/docker/docker/vendor/github.com/containerd/containerd.namespaceInterceptor.unary(0x55b7e25f7f49, 0x4, 0x55b7e3704fe0, 0xc000052038, 0x55b7e2651589, 0x2a, 0x55b7e354b960, 0xc00269df80, 0x55b7e354e780, 0xc0013adf40, ...)
/go/src/github.com/docker/docker/vendor/github.com/containerd/containerd/grpc.go:35 +0xf4
github.com/docker/docker/vendor/google.golang.org/grpc.(*ClientConn).Invoke(0xc000176dc0, 0x55b7e3704fe0, 0xc000052038, 0x55b7e2651589, 0x2a, 0x55b7e354b960, 0xc00269df80, 0x55b7e354e780, 0xc0013adf40, 0x0, ...)
/go/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.go:35 +0x10c
github.com/docker/docker/vendor/github.com/containerd/containerd/api/services/tasks/v1.(*tasksClient).Delete(0xc000db26a8, 0x55b7e3704fe0, 0xc000052038, 0xc00269df80, 0x0, 0x0, 0x0, 0xed4e7589b, 0x0, 0x0)
/go/src/github.com/docker/docker/vendor/github.com/containerd/containerd/api/services/tasks/v1/tasks.pb.go:1315 +0xd4
github.com/docker/docker/vendor/github.com/containerd/containerd.(*task).Delete(0xc000a12570, 0x55b7e3704fe0, 0xc000052038, 0x0, 0x0, 0x0, 0x4, 0x55b7e373b2e0, 0xc000a12570)
/go/src/github.com/docker/docker/vendor/github.com/containerd/containerd/task.go:312 +0x221
github.com/docker/docker/libcontainerd/remote.(*client).DeleteTask(0xc0008888a0, 0x55b7e3704fe0, 0xc000052038, 0xc000d236c0, 0x40, 0x0, 0x55b7e32203e0, 0x55b7e0ce4275, 0x196, 0xc00048200b, ...)
/go/src/github.com/docker/docker/libcontainerd/remote/client.go:438 +0xd6
github.com/docker/docker/daemon.(*Daemon).ProcessEvent(0xc00000c5a0, 0xc001a40b80, 0x40, 0x55b7e25f7d1d, 0x4, 0xc001a40b80, 0x40, 0xc001a40bc0, 0x40, 0x3b6b, ...)
/go/src/github.com/docker/docker/daemon/monitor.go:54 +0x253
github.com/docker/docker/libcontainerd/remote.(*client).processEvent.func1()
/go/src/github.com/docker/docker/libcontainerd/remote/client.go:634 +0x109
github.com/docker/docker/libcontainerd/queue.(*Queue).Append.func1(0xc001a40b00, 0x0, 0xc001514480, 0xc0016fbc80, 0xc0008888e0, 0xc001a40b80, 0x40)
/go/src/github.com/docker/docker/libcontainerd/queue/queue.go:28 +0x3a
created by github.com/docker/docker/libcontainerd/queue.(*Queue).Append
/go/src/github.com/docker/docker/libcontainerd/queue/queue.go:24 +0x186
https://github.com/containerd/containerd/pull/3540 seems promising.
Still testing, but it seems like the combination of docker 19.03.3 and containerd.io 1.2.10-2 may have resolved this issue for me.
The PR @cpuguy83 referenced above (containerd/containerd#3540) was backported to containerd 1.2.8 by containerd/containerd#3561, but as far as I can tell, the download.docker.com repository for bionic wasn't updated to include this patch until 1.2.10-2 within the past week or so.
We also met the similar issue on 18.09.9 docker-ce with containerd 1.2.10 and kernel is 4.19.0-041900-generic. Not all containers but some of them will hit the hang issue.
The container process and shim process are alive as pstree shows but the docker inspect/exec/stop/kill hang when docker ps can still find the container.
docker ps can still find the container
cloud-user@hm6-opshub-aio:~$ docker info | grep -i "Version|runc|containerd|shim"
WARNING: No swap limit support
Server Version: 18.09.9
Runtimes: runc
Default Runtime: runc
containerd version: b34a5c8af56e510852c35414db4c1f4fa6172339
runc version: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
init version: fec3683
Kernel Version: 4.19.0-041900-generic
cloud-user@hm6-opshub-aio:~$
cloud-user@hm6-opshub-aio:$ docker ps | grep zookeeper-0
7d1dbb28dac3 d43be510ec44 "/custom-entrypoint.…" 7 hours ago Up 7 hours k8s_zookeeper_zookeeper-0_opshub-data_ca906dcf-c623-460a-9271-1389b09912b0_1
fb0855d4e2e6 k8s.gcr.io/pause:3.1 "/pause" 46 hours ago Up 46 hours k8s_POD_zookeeper-0_opshub-data_ca906dcf-c623-460a-9271-1389b09912b0_0
docker inspect hang
cloud-user@hm6-opshub-aio:$ docker inspect 7d1dbb28dac3
^C
cloud-user@hm6-opshub-aio:$
shim and container process still alive:
cloud-user@hm6-opshub-aio:$ ps -ef | grep 7d1dbb28dac3
cloud-u+ 5797 2133 0 00:31 pts/0 00:00:00 grep --color=auto 7d1dbb28dac3
root 20154 1640 0 Nov11 ? 00:00:03 containerd-shim -namespace moby -workdir /var/lib/containerd/io.containerd.runtime.v1.linux/moby/7d1dbb28dac3216b2b5cbafa5c6b9dbb4a6565e56d9bd8c6f060ba7487b55501 -address /run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -runtime-root /var/run/docker/runtime-runc
cloud-user@hm6-opshub-aio:~$ pstree -Asp 20154
systemd(1)---containerd(1640)---containerd-shim(20154)-+-custom-entrypoi(25587)---java(17248)-+-{java}(17702)
| |-{java}(18065)
| |-{java}(6038)
| |-{java}(14587)
| |-{java}(14793)
| |-{java}(15040)
| |-{java}(19281)
| |-{java}(19301)
| |-{java}(11909)
| |-{java}(12716)
| |-{java}(13990)
| |-{java}(14318)
| |-{java}(15122)
| |-{java}(15182)
| |-{java}(22772)
| |-{java}(26839)
| |-{java}(27524)
| |-{java}(18492)
| |-{java}(18010)
| |-{java}(18022)
| |-{java}(18490)
| |-{java}(20045)
| |-{java}(20113)
| |-{java}(9862)
| |-{java}(7946)
| |-{java}(18670)
| |-{java}(19502)
| |-{java}(19504)
| |-{java}(22857)
| |-{java}(22858)
| |-{java}(23127)
| |-{java}(23178)
| |-{java}(1632)
| |-{java}(1602)
| |-{java}(15553)
| |-{java}(19236)
| |-{java}(22269)
| |-{java}(503)
| |-{java}(18600)
| |-{java}(22767)
| |-{java}(20046)
| |-{java}(20061)
| |-{java}(20129)
| |-{java}(26671)
| |-{java}(17418)
| |-{java}(18604)
| |-{java}(19293)
| |-{java}(19302)
| |-{java}(19487)
| |-{java}(22024)
| |-{java}(7675)
| |-{java}(17292)
| |-{java}(32502)
| |-{java}(18581)
| |-{java}(29362)
| |-{java}(27718)
| |-{java}(29804)
| |-{java}(4248)
| |-{java}(17326)
| |-{java}(17339)
| |-{java}(17350)
| |-{java}(17449)
| |-{java}(1625)
| |-{java}(1638)
| |-{java}(22305)
| |-{java}(19299)
| |-{java}(19314)
| |-{java}(19363)
| |-{java}(13804)
| |-{java}(13848)
| |-{java}(13964)
| |-{java}(14855)
| |-{java}(7284)
| |-{java}(29541)
| `-{java}(30103)
|-{containerd-shim}(20261)
|-{containerd-shim}(20385)
|-{containerd-shim}(20438)
|-{containerd-shim}(20769)
|-{containerd-shim}(21254)
|-{containerd-shim}(21839)
|-{containerd-shim}(22173)
|-{containerd-shim}(22238)
|-{containerd-shim}(30808)
`-{containerd-shim}(3440)
cloud-user@hm6-opshub-aio:~$
@cpuguy83, I am using docker 18.09.8 with containerd 1.2.6. If I want to resolve this issue by upgrading containerd to 1.2.8, can I just replace the containerd binary in my machine?
@zionwu 18.09.8 comes with separate containerd package. You can just download and install latest 1.2.10 rpm/deb from https://download.docker.com/linux/.
You can also just replace the containerd binaries from the latest static package. You will need to replace all binaries except the Docker ones. https://download.docker.com/linux/static/stable/x86_64/docker-19.03.5.tgz
@hakman, thank you for the help. I upgraded containerd to 1.2.10 but the issue still exists. Do you have any suggestions?
@zionwu My only other suggestion is to also try the latest Docker 19.03.5.
Any news here? Same problem with 19.03.7 and debian 9.
It is happening to me, with redis:5.0-alpine
. docker-compose down
times out, inspect
hangs and I have to restart the daemon and kill the container immediately . ps
shows the container there, even after the container itself has received SIGTERM.
I thought it was a zombie reaping issue, but I'm now unsure. Entrypoint should be PID 1, right?
The best I can think of is the container is handling SIGSTOP
/SIGTERM
on its own (redis has its own signal handler), and Docker thinks the container has been killed, but the process itself has not been killed.
date -u; cat docker-compose.yml | grep redis | grep sha; docker exec -it api-redis ps
Wed Feb 26 16:32:13 UTC 2020
image: "redis@sha256:ee13953704783b284c080b5b0abe4620730728054f5c19e9488d7a97ecd312c5"
PID USER TIME COMMAND
1 redis 5:47 redis-server --appendonly yes --loglevel verbose
32 root 0:00 ps
Docker version:
Client: Docker Engine - Community
Version: 19.03.5
API version: 1.40
Go version: go1.12.12
Git commit: 633a0ea838
Built: Wed Nov 13 07:29:52 2019
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.5
API version: 1.40 (minimum version 1.12)
Go version: go1.12.12
Git commit: 633a0ea838
Built: Wed Nov 13 07:28:22 2019
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.2.10
GitCommit: b34a5c8af56e510852c35414db4c1f4fa6172339
runc:
Version: 1.0.0-rc8+dev
GitCommit: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
docker-init:
Version: 0.18.0
GitCommit: fec3683
Docker-compose:
docker-compose -v
docker-compose version 1.23.2, build 1110ad01
Environment (AWS EC2, AMI ami-06d51e91cea0dac8d
:
# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.3 LTS
Release: 18.04
Codename: bionic
cephfs down lead docker inspect hang forever ? how to slove this problem ?
docker-compose hanging just happened for me. If I include a Dockerfile, it will hang. The following docker-compose.yml works, (note the Dockerfile build is commented out):
version: '3.5'
services:
tensorflow-notebook:
container_name: tensorflow-notebook-container
image: jupyter/tensorflow-notebook:latest
# build:
# context: .
# dockerfile: Dockerfile
volumes:
- ./data:/home/jovyan
ports:
- 8888:8888
network_mode: host
restart: unless-stopped
However, once I change it to use Dockerfile, it hangs:
version: '3.5'
services:
tensorflow-notebook:
container_name: tensorflow-notebook-container
# image: jupyter/tensorflow-notebook:latest
build:
context: .
dockerfile: Dockerfile
volumes:
- ./data:/home/jovyan
ports:
- 8888:8888
network_mode: host
restart: unless-stopped
Doesn't matter what's in the Dockerfile, as long as there's a reference to Dockerfile, it seems to hang
docker-compose version 1.26.0, build d4451659 Docker version 19.03.11, build 42e35e61f3 Ubuntu 18.04
Already tried restarting machine, restarting dockerd. Maybe the problem has been a while, I dont' know. This is the only container that I needed to use a Dockerfile, the others have been directly from images. I had to rebuild (haven't done that for at least a couple of months) after changing the Dockerfile: docker-compose down --rmi all -v docker-compose up -d
Doesn't matter what's in the Dockerfile, as long as there's a reference to Dockerfile, it seems to hang
I see you also have a data
directory in your current location. When doing a docker build
, docker will send the build-context to the daemon, which means that (unless excluded by a .dockerignore
file) all files and directories in the build-context will have to be transferred (even if not actually needed by your Dockerfile); if you have lots of files in that path, this could cause a delay (and lots of data to be sent)
I have same issue on docker 19.03.13, docker for mac, almost fresh install of the system. It was working normally for a month, and suddenly started to hang on every docker-compose
command. After investigation it seems like it hangs on docker inspect
while doing compose
What I've tried to fix this:
None of the above helped in any way. Still everything is hanging. Not sure if i should post it in docker for mac as i have exactly same issues as here so does not looks like system dependent. Does anyone have any working sollution for this? Should i uninstall and downgrade? I need working docker for work.
I'm running into the same set of symptoms as described by various individuals in this thread. This is my setup.
Client:
Version: 19.03.6-ce
API version: 1.40
Go version: go1.13.4
Git commit: 369ce74
Built: Fri May 29 04:01:26 2020
OS/Arch: linux/amd64
Experimental: false
Server:
Engine:
Version: 19.03.6-ce
API version: 1.40 (minimum version 1.12)
Go version: go1.13.4
Git commit: 369ce74
Built: Fri May 29 04:01:57 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.1
GitCommit: c623d1b36f09f8ef6536a057bd658b3aa8632828
runc:
Version: 1.0.0-rc92
GitCommit: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
docker-init:
Version: 0.18.0
GitCommit: fec3683
The errors I see.
Dec 18 18:30:08 ip-10-15-125-14.ec2.internal dockerd[3989]: time="2020-12-18T18:30:08.048544559Z" level=error msg="ebf3404977b8235dd09ff9ace3dacdd34c1471fa957907ce49047cb7ee6c01a8 cleanup: failed to delete container from containerd: no such container"
Dec 18 18:30:08 ip-10-15-125-14.ec2.internal dockerd[3989]: time="2020-12-18T18:30:08.048583023Z" level=error msg="Handler for POST /containers/ebf3404977b8235dd09ff9ace3dacdd34c1471fa957907ce49047cb7ee6c01a8/start returned error: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: read init-p: connection reset by peer: unknown"
and
Dec 21 14:00:08 ip-10-15-125-14.ec2.internal dockerd[3989]: time="2020-12-21T14:00:08.109472004Z" level=error msg="a7e3f489bb900db0b6601919481bed2a3800984e204d49263c0c75397a25d47a cleanup: failed to delete container from containerd: no such container"
Dec 21 14:00:08 ip-10-15-125-14.ec2.internal dockerd[3989]: time="2020-12-21T14:00:08.109509720Z" level=error msg="Handler for POST /containers/a7e3f489bb900db0b6601919481bed2a3800984e204d49263c0c75397a25d47a/start returned error: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:338: getting the final child's pid from pipe caused: EOF: unknown"
Both of those errors resulted from running a K8s Job. The workload container finishes. But the pause container is stuck in CREATED
state.
Also experiencing this issue. Ive had it happen intermittently in the past, but it is extremely pervasive this time.
Client: Docker Engine - Community Version: 24.0.4 API version: 1.43 Go version: go1.20.5 Git commit: 3713ee1 Built: Fri Jul 7 14:50:55 2023 OS/Arch: linux/amd64 Context: default
Server: Docker Engine - Community Engine: Version: 24.0.4 API version: 1.43 (minimum version 1.12) Go version: go1.20.5 Git commit: 4ffc614 Built: Fri Jul 7 14:50:55 2023 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.6.21 GitCommit: 3dce8eb055cbb6872793272b4f20ed16117344f8 runc: Version: 1.1.7 GitCommit: v1.1.7-0-g860f061 docker-init: Version: 0.19.0 GitCommit: de40ad0
Expected behavior
Docker inspect returns information, allowing docker-compose to work
Actual behavior
Docker inspect hangs forever and causes an HTTP timeout
Steps to reproduce the behavior
I've been getting this off and on since January of this year. But not on Mac, on Ubuntu. First on 16.04, then on 17.04, 17.10 and now on 18.04 - and my verbose output hangs at the exact same spot as the reports on the docker for mac issue, during container inspection. This is limited to Desktop linux, and some indicators seem to imply this is from the Mac going to sleep, which would line up with my experience. When Ubuntu goes to sleep and comes back, the issue is created. Rebooting the OS seems to be the only workaround, as restarting the docker daemon does not resolve the issue.
Output of
docker version
:Output of
docker info
:Additional environment details (AWS, VirtualBox, physical, etc.)
Tested and reproduced on Ubuntu Desktop 16.04, 17.04, 17.10 and 18.04
This is the exact same issue as seen on Mac desktops: https://github.com/docker/for-mac/issues/2378