Open ankkha opened 4 years ago
Hi @thaJeztah
Any help would be greatly appreciated.
Hard to tell what's causing it; how did you end up in this state? Was this an existing host, and it ended up in this state after some time, or anything specific that caused it? Did the machine run out of disk-space at some point perhaps?
I do see that the version of Azure's moby engine, containerd and runs are quite old, so it's possible this was caused by, or related to a bug in older versions.
@thaJeztah I am from the same team as @ankkha ... This is happening in the existing hosts (spinned up a year ago) and also on the new hosts which were spinned up 2 weeks ago. This happens across nodes on our cluster and it happens mostly daily. Only restarting docker or restarting the VM helps to restore.. We did not see any disk-space issue at any point.
Below are the docker info and docker version outputs of the newer VM's spinned up 2 weeks back..
Client: Debug Mode: false Plugins: buildx: Build with BuildKit (Docker Inc., 0.4.1+azure)
Server: Containers: 74 Running: 56 Paused: 0 Stopped: 18 Images: 45 Server Version: 19.03.12+azure Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: false Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: runc Default Runtime: runc Init Binary: docker-init containerd version: 8fba4e9a7d01810a393d5d25a3621dc101981175 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd init version: Security Options: apparmor seccomp Profile: default Kernel Version: 4.15.0-1092-azure Operating System: Ubuntu 16.04.7 LTS OSType: linux Architecture: x86_64 CPUs: 16 Total Memory: 62.91GiB Name: k8s-xxxxxxx-xxxxxxx ID: LRO2:T2XN:CSF2:PFTB:ONH5:YW5Y:Y2K6:OJYS:TESW:GYYK:7MMO:ZLCO Docker Root Dir: /var/lib/docker Debug Mode: false Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false
WARNING: No swap limit support
docker version
Client: Version: 19.03.12+azure API version: 1.40 Go version: go1.13.11 Git commit: 0ed913b885c8919944a2e4c8d0b80a318a8dd48b Built: Wed Jun 17 17:27:03 2020 OS/Arch: linux/amd64 Experimental: false
Server: Engine: Version: 19.03.12+azure API version: 1.40 (minimum version 1.12) Go version: go1.13.11 Git commit: 9dc6525e6118a25fab2be322d1914740ea842495 Built: Mon Mar 12 00:00:00 2018 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.3.7+azure GitCommit: 8fba4e9a7d01810a393d5d25a3621dc101981175 runc: Version: 1.0.0-rc10 GitCommit: dc9208a3303feef5b3839f4323d9beb36df0a9dd docker-init: Version: 0.18.0 GitCommit:
/cc @cpuguy83
@azharudheen Can you paste the exact error message?
@cpuguy83 Here is the error we get on docker status due to which the nodes are going to NotReady state:
Aug 23 14:53:42 dockerd[31669]: time="2020-08-23T14:53:42.770389936Z" level=error msg="Error running exec abea0fe4c7a000b8bd686a484d78dfa27ab30af5cdc4d48d0cd0d91e8f8b7819 in container: failed to open stdout fifo: error creating fifo /var/run/docker/containerd/2917d9adef4ffe81177a8d39733e4f570b7a96071fb3780f9cce8c77f66b3716/abea0fe4c7a000b8bd686a484d78dfa27ab30af5cdc4d48d0cd0d91e8f8b7819-stdout: no such file or directory" Aug 23 14:54:12 dockerd[31669]: time="2020-08-23T14:54:12.769172205Z" level=error msg="Error running exec 84634fbd282677a209c7570202eb3deff0d49520e6280de5dccd91ec9d03fbf3 in container: failed to open stdout fifo: error creating fifo /var/run/docker/containerd/2917d9adef4ffe81177a8d39733e4f570b7a96071fb3780f9cce8c77f66b3716/84634fbd282677a209c7570202eb3deff0d49520e6280de5dccd91ec9d03fbf3-stdout: no such file or directory" Aug 23 14:54:42 dockerd[31669]: time="2020-08-23T14:54:42.769662027Z" level=error msg="Error running exec 78a0078f2fef632051f680bba3451698b30a343b66215965728f893685f2e337 in container: failed to open stdout fifo: error creating fifo /var/run/docker/containerd/2917d9adef4ffe81177a8d39733e4f570b7a96071fb3780f9cce8c77f66b3716/78a0078f2fef632051f680bba3451698b30a343b66215965728f893685f2e337-stdout: no such file or directory"
@azharudheen What's in your systemd unit config as well as /etc/docker/daemon.json?
@cpuguy83 Below is daemon.json.. On systemd config, are you looking for some specific file content ? I could see the file system.conf under /etc/systemd but no values are defined, all are commented.
{ "icc": false, "userland-proxy": false, "log-level": "info", "log-driver": "json-file", "log-opts": { "max-size": "50m", "max-file": "5" } }
Should be in /etc/systemd/system/docker.service
I could see 2 files under /etc/systemd/system/docker.service.d. Here are the file contents of those 2 files.
[Service] MountFlags=shared
[Service] ExecStart= ExecStart=/usr/bin/dockerd -H fd:// --storage-driver=overlay2 --bip=192.168.21.5/24 ExecStartPost=/sbin/iptables -P FORWARD ACCEPT
@azharudheen Probably you should add --containerd=/run/containerd/containerd.sock
to your ExecStart
(or set your options in daemon.json).
@cpuguy83 Sure, I can give a try. But can you please tell me what is your take on this issue, whats the purpose of this change and how it will help ? Since this is happening only in our production env, trying to make sure it does not impact anything. Thanks for your inputs !
The change tells docker to connect to the already running containerd instance, which should be started as per the dependency chain (unless you didn't allow the installation to write the systemd unit for whatever reason... like if there were modifications on the file).
Ok got it, Thanks. Let me try and update you.
@azharudheen @cpuguy83 Updated exec_start.conf in one of the VM with given changes at the last. Now new file looks like
[Service] ExecStart= ExecStart=/usr/bin/dockerd -H fd:// --storage-driver=overlay2 --bip=192.168.21.5/24 ExecStartPost=/sbin/iptables -P FORWARD ACCEPT --containerd=/run/containerd/containerd.sock
Sorry you've got that in the wrong spot It's a flag to dockerd
@cpuguy83 Ohh you mean, it should be like this
ExecStart=/usr/bin/dockerd -H fd:// --storage-driver=overlay2 --bip=192.168.21.5/24 --containerd=/run/containerd/containerd.sock
Yep
But you only want to do that if you've updated to the 19.03 version. 3.0.x does not run containerd as a separate systemd unit.
@cpuguy83 We have around 80% VMs with Docker 3.x.x version . Will check if we can upgrade version in all the nodes since its about Prod. Surprisingly this is not happening in lower environment's VMs
Ubuntu 16.04.6 LTS 4.15.0-1092-azure docker://3.0.6 Ubuntu 16.04.6 LTS 4.15.0-1089-azure docker://3.0.8 Ubuntu 16.04.7 LTS 4.15.0-1092-azure docker://19.3.12 Ubuntu 16.04.7 LTS 4.15.0-1092-azure docker://19.3.12 Ubuntu 16.04.6 LTS 4.15.0-1092-azure docker://3.0.6 Ubuntu 16.04.6 LTS 4.15.0-1089-azure docker://3.0.6 Ubuntu 16.04.6 LTS 4.15.0-1092-azure docker://3.0.6 Ubuntu 16.04.6 LTS 4.15.0-1089-azure docker://3.0.6 Ubuntu 16.04.7 LTS 4.15.0-1092-azure docker://19.3.12 Ubuntu 16.04.6 LTS 4.15.0-1092-azure docker://3.0.8 Ubuntu 16.04.6 LTS 4.15.0-1092-azure docker://3.0.8 Ubuntu 16.04.6 LTS 4.15.0-1092-azure docker://3.0.6 Ubuntu 16.04.6 LTS 4.15.0-1092-azure docker://3.0.6 Ubuntu 16.04.6 LTS 4.15.0-1092-azure docker://3.0.7
i have same issue with the same problem happen every one or two days, along with those log message ,i see some other log info: Sep 01 05:08:24 worker14 dockerd[56425]: time="2020-09-01T05:08:24.612839093+08:00" level=error msg="stream copy error: reading from a closed fifo"
Sep 01 05:08:24 worker14 dockerd[56425]: time="2020-09-01T05:08:24.651070368+08:00" level=error msg="Error running exec 31984fcb31191daae387d6f360c60ca0f0e8a15597843ee7d0f7a8ef1582eda3 in container: OCI runtime state failed: exec failed: container_linux.go:346: starting container process caused \"process_linux.go:101: executing setns process caused \\"signal: segmentation fault (core dumped)\\"\": unknown"
here is my docker version: Docker version 19.03.4, build 9013bf583a and docker systemd start expression: /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
Hi @cpuguy83, Even after upgrading dockeer version to 19.3.x and applying the ExecStart param setting as you suggested, the issue seems to occur.. do you have any other suggestion/recommendation for us ? Thanks for your help, in advance!
After lot of analysis , we observed that microservices where team has implemented multithreading causing issues. once we rolled back the code to non-multithreading version , things get settled down.
We are also hitting a similar issue after upgrading to docker 19.03. Any workaround for this issue ? @cpuguy83
kubectl -n default exec -it my-pod-1 -- bash
failed to open stdin fifo 8aaae5f3511c7b388412e2082d60a04afa21267dfed5788dcb5b7c8d3b9c9b5e-stdin: stat 8aaae5f3511c7b388412e2082d60a04afa21267dfed5788dcb5b7c8d3b9c9b5e-stdin: no such file or directory: unknown
command terminated with exit code 126
Expected behavior
Actual behavior
o/p of sudo service docker status
Steps to reproduce the behavior
Output of
docker version
:Output of
docker info
:Additional environment details (AWS, VirtualBox, physical, etc.)
Its a kubernetes VM and because of this issue node shows NotReady state.