Error running exec in container: failed to open stdout fifo: error creating fifo /var/run/docker/containerd/-stdout: no such file or directory"

ankkha commented 4 years ago

[x] This is a bug report
[ ] This is a feature request
[x] I searched existing issues before opening this one

Expected behavior

Actual behavior

o/p of sudo service docker status

Aug 23 14:53:42  dockerd[31669]: time="2020-08-23T14:53:42.770389936Z" level=error msg="Error running exec abea0fe4c7a000b8bd686a484d78dfa27ab30af5cdc4d48d0cd0d91e8f8b7819 in container: failed to open stdout fifo: error creating fifo /var/run/docker/containerd/2917d9adef4ffe81177a8d39733e4f570b7a96071fb3780f9cce8c77f66b3716/abea0fe4c7a000b8bd686a484d78dfa27ab30af5cdc4d48d0cd0d91e8f8b7819-stdout: no such file or directory"
Aug 23 14:54:12  dockerd[31669]: time="2020-08-23T14:54:12.769172205Z" level=error msg="Error running exec 84634fbd282677a209c7570202eb3deff0d49520e6280de5dccd91ec9d03fbf3 in container: failed to open stdout fifo: error creating fifo /var/run/docker/containerd/2917d9adef4ffe81177a8d39733e4f570b7a96071fb3780f9cce8c77f66b3716/84634fbd282677a209c7570202eb3deff0d49520e6280de5dccd91ec9d03fbf3-stdout: no such file or directory"
Aug 23 14:54:42  dockerd[31669]: time="2020-08-23T14:54:42.769662027Z" level=error msg="Error running exec 78a0078f2fef632051f680bba3451698b30a343b66215965728f893685f2e337 in container: failed to open stdout fifo: error creating fifo /var/run/docker/containerd/2917d9adef4ffe81177a8d39733e4f570b7a96071fb3780f9cce8c77f66b3716/78a0078f2fef632051f680bba3451698b30a343b66215965728f893685f2e337-stdout: no such file or directory"

Steps to reproduce the behavior

Output of docker version:

Client:
 Version:           3.0.6
 API version:       1.40
 Go version:        go1.12.5
 Git commit:        a63faebc
 Built:             Wed Jun 19 22:56:37 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          3.0.6
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.5
  Git commit:       b07f53d
  Built:            Wed Jun 19 22:28:55 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.2.6
  GitCommit:        894b81a4b802e4eb2a91d1ce216b8817763c29fb
 runc:
  Version:          1.0.0-rc8
  GitCommit:        425e105d5a03fabd737a126ad93d62a9eeede87f
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Output of docker info:

Client:
 Debug Mode: false

Server:
 Containers: 45
  Running: 20
  Paused: 0
  Stopped: 25
 Images: 181
 Server Version: 3.0.6
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
 runc version: 425e105d5a03fabd737a126ad93d62a9eeede87f
 init version: fec3683
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 4.15.0-1092-azure
 Operating System: Ubuntu 16.04.6 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 16
 Total Memory: 62.91GiB
 Name: k8s-ltyprod2n011-21337720-1
 ID: FGCS:7SH4:EIAD:6X6P:4THC:4AQ3:MEA3:XBLP:WLQ7:SZUJ:AGQP:DNZY
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support

Additional environment details (AWS, VirtualBox, physical, etc.)

Its a kubernetes VM and because of this issue node shows NotReady state.

ankkha commented 4 years ago

Hi @thaJeztah

Any help would be greatly appreciated.

thaJeztah commented 4 years ago

Hard to tell what's causing it; how did you end up in this state? Was this an existing host, and it ended up in this state after some time, or anything specific that caused it? Did the machine run out of disk-space at some point perhaps?

I do see that the version of Azure's moby engine, containerd and runs are quite old, so it's possible this was caused by, or related to a bug in older versions.

azharudheen commented 4 years ago

@thaJeztah I am from the same team as @ankkha ... This is happening in the existing hosts (spinned up a year ago) and also on the new hosts which were spinned up 2 weeks ago. This happens across nodes on our cluster and it happens mostly daily. Only restarting docker or restarting the VM helps to restore.. We did not see any disk-space issue at any point.

Below are the docker info and docker version outputs of the newer VM's spinned up 2 weeks back..

docker info

Client: Debug Mode: false Plugins: buildx: Build with BuildKit (Docker Inc., 0.4.1+azure)

Server: Containers: 74 Running: 56 Paused: 0 Stopped: 18 Images: 45 Server Version: 19.03.12+azure Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: false Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: runc Default Runtime: runc Init Binary: docker-init containerd version: 8fba4e9a7d01810a393d5d25a3621dc101981175 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd init version: Security Options: apparmor seccomp Profile: default Kernel Version: 4.15.0-1092-azure Operating System: Ubuntu 16.04.7 LTS OSType: linux Architecture: x86_64 CPUs: 16 Total Memory: 62.91GiB Name: k8s-xxxxxxx-xxxxxxx ID: LRO2:T2XN:CSF2:PFTB:ONH5:YW5Y:Y2K6:OJYS:TESW:GYYK:7MMO:ZLCO Docker Root Dir: /var/lib/docker Debug Mode: false Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false

WARNING: No swap limit support

docker version

Client: Version: 19.03.12+azure API version: 1.40 Go version: go1.13.11 Git commit: 0ed913b885c8919944a2e4c8d0b80a318a8dd48b Built: Wed Jun 17 17:27:03 2020 OS/Arch: linux/amd64 Experimental: false

Server: Engine: Version: 19.03.12+azure API version: 1.40 (minimum version 1.12) Go version: go1.13.11 Git commit: 9dc6525e6118a25fab2be322d1914740ea842495 Built: Mon Mar 12 00:00:00 2018 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.3.7+azure GitCommit: 8fba4e9a7d01810a393d5d25a3621dc101981175 runc: Version: 1.0.0-rc10 GitCommit: dc9208a3303feef5b3839f4323d9beb36df0a9dd docker-init: Version: 0.18.0 GitCommit:

thaJeztah commented 4 years ago

/cc @cpuguy83

cpuguy83 commented 4 years ago

@azharudheen Can you paste the exact error message?

azharudheen commented 4 years ago

@cpuguy83 Here is the error we get on docker status due to which the nodes are going to NotReady state:

o/p of sudo service docker status

Aug 23 14:53:42 dockerd[31669]: time="2020-08-23T14:53:42.770389936Z" level=error msg="Error running exec abea0fe4c7a000b8bd686a484d78dfa27ab30af5cdc4d48d0cd0d91e8f8b7819 in container: failed to open stdout fifo: error creating fifo /var/run/docker/containerd/2917d9adef4ffe81177a8d39733e4f570b7a96071fb3780f9cce8c77f66b3716/abea0fe4c7a000b8bd686a484d78dfa27ab30af5cdc4d48d0cd0d91e8f8b7819-stdout: no such file or directory" Aug 23 14:54:12 dockerd[31669]: time="2020-08-23T14:54:12.769172205Z" level=error msg="Error running exec 84634fbd282677a209c7570202eb3deff0d49520e6280de5dccd91ec9d03fbf3 in container: failed to open stdout fifo: error creating fifo /var/run/docker/containerd/2917d9adef4ffe81177a8d39733e4f570b7a96071fb3780f9cce8c77f66b3716/84634fbd282677a209c7570202eb3deff0d49520e6280de5dccd91ec9d03fbf3-stdout: no such file or directory" Aug 23 14:54:42 dockerd[31669]: time="2020-08-23T14:54:42.769662027Z" level=error msg="Error running exec 78a0078f2fef632051f680bba3451698b30a343b66215965728f893685f2e337 in container: failed to open stdout fifo: error creating fifo /var/run/docker/containerd/2917d9adef4ffe81177a8d39733e4f570b7a96071fb3780f9cce8c77f66b3716/78a0078f2fef632051f680bba3451698b30a343b66215965728f893685f2e337-stdout: no such file or directory"

cpuguy83 commented 4 years ago

@azharudheen What's in your systemd unit config as well as /etc/docker/daemon.json?

azharudheen commented 4 years ago

@cpuguy83 Below is daemon.json.. On systemd config, are you looking for some specific file content ? I could see the file system.conf under /etc/systemd but no values are defined, all are commented.

cat /etc/docker/daemon.json

{ "icc": false, "userland-proxy": false, "log-level": "info", "log-driver": "json-file", "log-opts": { "max-size": "50m", "max-file": "5" } }

cpuguy83 commented 4 years ago

Should be in /etc/systemd/system/docker.service

azharudheen commented 4 years ago

I could see 2 files under /etc/systemd/system/docker.service.d. Here are the file contents of those 2 files.

cat clear_mount_propagation_flags.conf

[Service] MountFlags=shared

cat exec_start.conf

[Service] ExecStart= ExecStart=/usr/bin/dockerd -H fd:// --storage-driver=overlay2 --bip=192.168.21.5/24 ExecStartPost=/sbin/iptables -P FORWARD ACCEPT

cpuguy83 commented 4 years ago

@azharudheen Probably you should add --containerd=/run/containerd/containerd.sock to your ExecStart (or set your options in daemon.json).

azharudheen commented 4 years ago

@cpuguy83 Sure, I can give a try. But can you please tell me what is your take on this issue, whats the purpose of this change and how it will help ? Since this is happening only in our production env, trying to make sure it does not impact anything. Thanks for your inputs !

cpuguy83 commented 4 years ago

The change tells docker to connect to the already running containerd instance, which should be started as per the dependency chain (unless you didn't allow the installation to write the systemd unit for whatever reason... like if there were modifications on the file).

azharudheen commented 4 years ago

Ok got it, Thanks. Let me try and update you.

ankkha commented 4 years ago

@azharudheen @cpuguy83 Updated exec_start.conf in one of the VM with given changes at the last. Now new file looks like

[Service] ExecStart= ExecStart=/usr/bin/dockerd -H fd:// --storage-driver=overlay2 --bip=192.168.21.5/24 ExecStartPost=/sbin/iptables -P FORWARD ACCEPT --containerd=/run/containerd/containerd.sock

cpuguy83 commented 4 years ago

Sorry you've got that in the wrong spot It's a flag to dockerd

ankkha commented 4 years ago

@cpuguy83 Ohh you mean, it should be like this

ExecStart=/usr/bin/dockerd -H fd:// --storage-driver=overlay2 --bip=192.168.21.5/24 --containerd=/run/containerd/containerd.sock

cpuguy83 commented 4 years ago

Yep

cpuguy83 commented 4 years ago

But you only want to do that if you've updated to the 19.03 version. 3.0.x does not run containerd as a separate systemd unit.

ankkha commented 4 years ago

@cpuguy83 We have around 80% VMs with Docker 3.x.x version . Will check if we can upgrade version in all the nodes since its about Prod. Surprisingly this is not happening in lower environment's VMs

Ubuntu 16.04.6 LTS 4.15.0-1092-azure docker://3.0.6 Ubuntu 16.04.6 LTS 4.15.0-1089-azure docker://3.0.8 Ubuntu 16.04.7 LTS 4.15.0-1092-azure docker://19.3.12 Ubuntu 16.04.7 LTS 4.15.0-1092-azure docker://19.3.12 Ubuntu 16.04.6 LTS 4.15.0-1092-azure docker://3.0.6 Ubuntu 16.04.6 LTS 4.15.0-1089-azure docker://3.0.6 Ubuntu 16.04.6 LTS 4.15.0-1092-azure docker://3.0.6 Ubuntu 16.04.6 LTS 4.15.0-1089-azure docker://3.0.6 Ubuntu 16.04.7 LTS 4.15.0-1092-azure docker://19.3.12 Ubuntu 16.04.6 LTS 4.15.0-1092-azure docker://3.0.8 Ubuntu 16.04.6 LTS 4.15.0-1092-azure docker://3.0.8 Ubuntu 16.04.6 LTS 4.15.0-1092-azure docker://3.0.6 Ubuntu 16.04.6 LTS 4.15.0-1092-azure docker://3.0.6 Ubuntu 16.04.6 LTS 4.15.0-1092-azure docker://3.0.7

zhengjianfei111 commented 4 years ago

i have same issue with the same problem happen every one or two days, along with those log message ,i see some other log info: Sep 01 05:08:24 worker14 dockerd[56425]: time="2020-09-01T05:08:24.612839093+08:00" level=error msg="stream copy error: reading from a closed fifo"

Sep 01 05:08:24 worker14 dockerd[56425]: time="2020-09-01T05:08:24.651070368+08:00" level=error msg="Error running exec 31984fcb31191daae387d6f360c60ca0f0e8a15597843ee7d0f7a8ef1582eda3 in container: OCI runtime state failed: exec failed: container_linux.go:346: starting container process caused \"process_linux.go:101: executing setns process caused \\"signal: segmentation fault (core dumped)\\"\": unknown"

here is my docker version: Docker version 19.03.4, build 9013bf583a and docker systemd start expression: /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock

azharudheen commented 4 years ago

Hi @cpuguy83, Even after upgrading dockeer version to 19.3.x and applying the ExecStart param setting as you suggested, the issue seems to occur.. do you have any other suggestion/recommendation for us ? Thanks for your help, in advance!

ankkha commented 4 years ago

After lot of analysis , we observed that microservices where team has implemented multithreading causing issues. once we rolled back the code to non-multithreading version , things get settled down.

rsparulek commented 3 years ago

We are also hitting a similar issue after upgrading to docker 19.03. Any workaround for this issue ? @cpuguy83

kubectl  -n default exec -it my-pod-1 -- bash
failed to open stdin fifo 8aaae5f3511c7b388412e2082d60a04afa21267dfed5788dcb5b7c8d3b9c9b5e-stdin: stat 8aaae5f3511c7b388412e2082d60a04afa21267dfed5788dcb5b7c8d3b9c9b5e-stdin: no such file or directory: unknown
command terminated with exit code 126

docker / for-linux