docker / for-linux

Docker Engine for Linux
https://docs.docker.com/engine/installation/
754 stars 85 forks source link

Can't start Docker on RHEL8-x86 #1402

Closed suyuyi closed 2 years ago

suyuyi commented 2 years ago

Expected behavior

docker run hello-world should start a container and print "Hello from Docker! ...".

Actual behavior

The result of docker run hello-world is not sure. The most of time, it creates the same error, but a few times, it successes, and print "Hello from Docker! ...", just like below:

[root@...]# docker run hello-world
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
2db29710123e: Pull complete
Digest: sha256:80f31da1ac7b312ba29d65080fddf797dd76acfb870e677f390d5acba9741b17
Status: Downloaded newer image for hello-world:latest
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: waiting for init preliminary setup: read init-p: connection reset by peer: unknown.
ERRO[0002] error waiting for container: context canceled
[root@...]# docker run -p 8000:8000 hello-world
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: can't copy bootstrap data to pipe: write init-p: broken pipe: unknown.
ERRO[0000] error waiting for container: context canceled
... ERRO[0000] * 10 ...
[root@...]# docker run -p 9001:9001 hello-world
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: waiting for init preliminary setup: read init-p: connection reset by peer: unknown.
ERRO[0000] error waiting for container: context canceled
[root@...]# docker run -p 9001:9001 hello-world

Hello from Docker!
This message shows that your installation appears to be working correctly.
...

That's so confused, and I don't know what is wrong.

Steps to reproduce the behavior

I use a script to install, it is below.

#!/bin/bash

# 1.connect to outside
default_route_addr=$(ip route | grep default | awk  '{print $3}')
if [[ ! $default_route_addr =~ .4$ ]]; then
int_route_address=${default_route_addr%.*}".4"
ip route replace default via "$int_route_address"
cat >>/etc/sysconfig/network-scripts/ifcfg-eth0<<EOF
gateway=$int_route_address
EOF
fi
# 2.install Docker
sudo dnf config-manager --add-repo=https://download.docker.com/linux/centos/docker-ce.repo
sudo dnf install docker-ce -y
# create docker config file
mkdir -p /etc/docker
touch /etc/docker/daemon.json
cat>>"/etc/docker/daemon.json"<<EOF
{
  "graph": "/vdb/lib/docker"
}
EOF
# append "-H tcp://0.0.0.0:2375"
sed -i '13c ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock -H tcp://0.0.0.0:2375' /usr/lib/systemd/system/docker.service
# restart docker engine
sudo systemctl enable docker
sudo systemctl daemon-reload
sudo systemctl restart docker

Output of docker version:

Client: Docker Engine - Community
 Version:           20.10.16
 API version:       1.41
 Go version:        go1.17.10
 Git commit:        aa7e414
 Built:             Thu May 12 09:17:20 2022
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.16
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.17.10
  Git commit:       f756502
  Built:            Thu May 12 09:15:41 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.4
  GitCommit:        212e8b6fa2f44b9c21b2798135fc6fb7c53efc16
 runc:
  Version:          1.1.1
  GitCommit:        v1.1.1-0-g52de29d
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Output of docker info:

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Docker Buildx (Docker Inc., v0.8.2-docker)
  scan: Docker Scan (Docker Inc., v0.17.0)

Server:
 Containers: 18
  Running: 0
  Paused: 0
  Stopped: 18
 Images: 1
 Server Version: 20.10.16
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 212e8b6fa2f44b9c21b2798135fc6fb7c53efc16
 runc version: v1.1.1-0-g52de29d
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 4.18.0-348.20.1.el8_5.x86_64
 Operating System: Red Hat Enterprise Linux 8.4 (Ootpa)
 OSType: linux
 Architecture: x86_64
 CPUs: 28
 Total Memory: 125.6GiB
 Name: wpsj1pnt222.webex.com
 ID: VDUA:HFCP:3S5U:LJN6:GD5C:NOTM:HJ42:U2XE:S2BX:6XWQ:FPGD:LGEA
 Docker Root Dir: /vdb/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: API is accessible on http://0.0.0.0:2375 without encryption.
         Access to the remote API is equivalent to root access on the host. Refer
         to the 'Docker daemon attack surface' section in the documentation for
         more information: https://docs.docker.com/go/attack-surface/

Additional environment details (AWS, VirtualBox, physical, etc.) uname -srm : Linux 4.18.0-348.20.1.el8_5.x86_64 x86_64 cat /proc/version : Linux version 4.18.0-348.20.1.el8_5.x86_64 (mockbuild@x86-vm-07.build.eng.bos.redhat.com) (gcc version 8.5.0 20210514 (Red Hat 8.5.0-4) (GCC)) #1 SMP Tue Mar 8 12:56:54 EST 2022 I used this check-config.sh

[root@...]# bash check-config.sh
warning: /proc/config.gz does not exist, searching other paths for kernel config ...
info: reading kernel config from /boot/config-4.18.0-348.20.1.el8_5.x86_64 ...

Generally Necessary:
- cgroup hierarchy: properly mounted [/sys/fs/cgroup]
- CONFIG_NAMESPACES: enabled
- CONFIG_NET_NS: enabled
- CONFIG_PID_NS: enabled
- CONFIG_IPC_NS: enabled
- CONFIG_UTS_NS: enabled
- CONFIG_CGROUPS: enabled
- CONFIG_CGROUP_CPUACCT: enabled
- CONFIG_CGROUP_DEVICE: enabled
- CONFIG_CGROUP_FREEZER: enabled
- CONFIG_CGROUP_SCHED: enabled
- CONFIG_CPUSETS: enabled
- CONFIG_MEMCG: enabled
- CONFIG_KEYS: enabled
- CONFIG_VETH: enabled (as module)
- CONFIG_BRIDGE: enabled (as module)
- CONFIG_BRIDGE_NETFILTER: enabled (as module)
- CONFIG_IP_NF_FILTER: enabled (as module)
- CONFIG_IP_NF_TARGET_MASQUERADE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_CONNTRACK: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_IPVS: enabled (as module)
- CONFIG_NETFILTER_XT_MARK: enabled (as module)
- CONFIG_IP_NF_NAT: enabled (as module)
- CONFIG_NF_NAT: enabled (as module)
- CONFIG_POSIX_MQUEUE: enabled
- CONFIG_NF_NAT_IPV4: missing
- CONFIG_NF_NAT_NEEDED: enabled
- CONFIG_CGROUP_BPF: enabled

Optional Features:
- CONFIG_USER_NS: enabled
- CONFIG_SECCOMP: enabled
- CONFIG_SECCOMP_FILTER: enabled
- CONFIG_CGROUP_PIDS: enabled
- CONFIG_MEMCG_SWAP: enabled
- CONFIG_MEMCG_SWAP_ENABLED: missing
    (cgroup swap accounting is currently enabled)
- CONFIG_LEGACY_VSYSCALL_EMULATE: enabled
- CONFIG_IOSCHED_CFQ: missing
- CONFIG_CFQ_GROUP_IOSCHED: missing
- CONFIG_BLK_CGROUP: enabled
- CONFIG_BLK_DEV_THROTTLING: enabled
- CONFIG_CGROUP_PERF: enabled
- CONFIG_CGROUP_HUGETLB: enabled
- CONFIG_NET_CLS_CGROUP: enabled
- CONFIG_CGROUP_NET_PRIO: enabled
- CONFIG_CFS_BANDWIDTH: enabled
- CONFIG_FAIR_GROUP_SCHED: enabled
- CONFIG_RT_GROUP_SCHED: enabled
- CONFIG_IP_NF_TARGET_REDIRECT: enabled (as module)
- CONFIG_IP_VS: enabled (as module)
- CONFIG_IP_VS_NFCT: enabled
- CONFIG_IP_VS_PROTO_TCP: enabled
- CONFIG_IP_VS_PROTO_UDP: enabled
- CONFIG_IP_VS_RR: enabled (as module)
- CONFIG_SECURITY_SELINUX: enabled
- CONFIG_SECURITY_APPARMOR: missing
- CONFIG_EXT4_FS: enabled (as module)
- CONFIG_EXT4_FS_POSIX_ACL: enabled
- CONFIG_EXT4_FS_SECURITY: enabled
- Network Drivers:
  - "overlay":
    - CONFIG_VXLAN: enabled (as module)
    - CONFIG_BRIDGE_VLAN_FILTERING: enabled
      Optional (for encrypted networks):
      - CONFIG_CRYPTO: enabled
      - CONFIG_CRYPTO_AEAD: enabled
      - CONFIG_CRYPTO_GCM: enabled
      - CONFIG_CRYPTO_SEQIV: enabled (as module)
      - CONFIG_CRYPTO_GHASH: enabled
      - CONFIG_XFRM: enabled
      - CONFIG_XFRM_USER: enabled
      - CONFIG_XFRM_ALGO: enabled
      - CONFIG_INET_ESP: enabled (as module)
      - CONFIG_INET_XFRM_MODE_TRANSPORT: missing
  - "ipvlan":
    - CONFIG_IPVLAN: enabled (as module)
  - "macvlan":
    - CONFIG_MACVLAN: enabled (as module)
    - CONFIG_DUMMY: enabled (as module)
  - "ftp,tftp client in container":
    - CONFIG_NF_NAT_FTP: enabled (as module)
    - CONFIG_NF_CONNTRACK_FTP: enabled (as module)
    - CONFIG_NF_NAT_TFTP: enabled (as module)
    - CONFIG_NF_CONNTRACK_TFTP: enabled (as module)
- Storage Drivers:
  - "aufs":
    - CONFIG_AUFS_FS: missing
  - "btrfs":
    - CONFIG_BTRFS_FS: missing
    - CONFIG_BTRFS_FS_POSIX_ACL: missing
  - "devicemapper":
    - CONFIG_BLK_DEV_DM: enabled (as module)
    - CONFIG_DM_THIN_PROVISIONING: enabled (as module)
  - "overlay":
    - CONFIG_OVERLAY_FS: enabled (as module)
  - "zfs":
    - /dev/zfs: missing
    - zfs command: missing
    - zpool command: missing

Limits:
- /proc/sys/kernel/keys/root_maxkeys: 1000000

And I also checked the docker.service, there is no MountFlags

[root@...]# cat /usr/lib/systemd/system/docker.service
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network-online.target docker.socket firewalld.service containerd.service
Wants=network-online.target
Requires=docker.socket containerd.service

[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock -H tcp://0.0.0.0:2375
ExecReload=/bin/kill -s HUP $MAINPID
TimeoutSec=0
RestartSec=2
Restart=always

# Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229.
# Both the old, and new location are accepted by systemd 229 and up, so using the old location
# to make them work for either version of systemd.
StartLimitBurst=3

# Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230.
# Both the old, and new name are accepted by systemd 230 and up, so using the old name to make
# this option work for either version of systemd.
StartLimitInterval=60s

# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity

# Comment TasksMax if your systemd version does not support it.
# Only systemd 226 and above support this option.
TasksMax=infinity

# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes

# kill only the docker process, not all processes in the cgroup
KillMode=process
OOMScoreAdjust=-500

[Install]
WantedBy=multi-user.target

And here is syslog, I thought maybe useful.

May 30 08:08:13 ... containerd[7086]: time="2022-05-30T08:08:13.995118813Z" level=warning msg="cleanup warnings time=\"2022-05-30T08:08:13Z\" level=info msg=\"starting signal loop\" namespace=moby pid=18048 runtime=io.containerd.runc.v2\ntime=\"2022-05-30T08:08:13Z\" level=warning msg=\"failed to read init pid file\" error=\"open /run/containerd/io.containerd.runtime.v2.task/moby/ae4296c7c794b7c99e4a032ab24a45203c7dd0ba9ae905fd04d937f3963ae1f9/init.pid: no such file or directory\" runtime=io.containerd.runc.v2\n"
May 30 08:08:13 ... containerd[7086]: time="2022-05-30T08:08:13.995515114Z" level=error msg="copy shim log" error="read /proc/self/fd/12: file already closed"
May 30 08:08:13 ... dockerd[7238]: time="2022-05-30T08:08:13.996065605Z" level=error msg="stream copy error: reading from a closed fifo"
May 30 08:08:13 ... dockerd[7238]: time="2022-05-30T08:08:13.996090028Z" level=error msg="stream copy error: reading from a closed fifo"
May 30 08:08:14 ... kernel: docker0: port 1(vethbe07b2d) entered disabled state
May 30 08:08:14 ... kernel: device vethbe07b2d left promiscuous mode
May 30 08:08:14 ... kernel: docker0: port 1(vethbe07b2d) entered disabled state
May 30 08:08:14 ... NetworkManager[1477]: <info>  [1653898094.0646] device (vethbe07b2d): released from master device docker0
May 30 08:08:14 ... dockerd[7238]: time="2022-05-30T08:08:14.091855761Z" level=error msg="ae4296c7c794b7c99e4a032ab24a45203c7dd0ba9ae905fd04d937f3963ae1f9 cleanup: failed to delete container from containerd: no such container"
May 30 08:08:14 ... dockerd[7238]: time="2022-05-30T08:08:14.091897028Z" level=error msg="Handler for POST /v1.41/containers/ae4296c7c794b7c99e4a032ab24a45203c7dd0ba9ae905fd04d937f3963ae1f9/start returned error: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: waiting for init preliminary setup: read init-p: connection reset by peer: unknown"
May 30 08:08:14 ... systemd[2122]: vdb-lib-docker-overlay2-6e697eb148a3dd5555f7651df4c55a1e6715fb3a6d4f9e3fb5c54a7f8b235f11-merged.mount: Succeeded.
May 30 08:08:26 ... systemd[2122]: vdb-lib-docker-overlay2-9dc3231215d95afd5b5d39f3621264c0a113e6c8d00198b8eb21ad5014a0b53d\x2dinit-merged.mount: Succeeded.
May 30 08:08:26 ... systemd[2122]: vdb-lib-docker-overlay2-9dc3231215d95afd5b5d39f3621264c0a113e6c8d00198b8eb21ad5014a0b53d-merged.mount: Succeeded.
May 30 08:08:26 ... systemd-udevd[18125]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
May 30 08:08:26 ... kernel: docker0: port 1(veth7a9e231) entered blocking state
May 30 08:08:26 ... kernel: docker0: port 1(veth7a9e231) entered disabled state
May 30 08:08:26 ... kernel: device veth7a9e231 entered promiscuous mode
May 30 08:08:26 ... systemd-udevd[18126]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
May 30 08:08:26 ... systemd-udevd[18126]: Could not generate persistent MAC address for veth7a9e231: No such file or directory
May 30 08:08:26 ... systemd-udevd[18125]: Could not generate persistent MAC address for veth5b49c1e: No such file or directory
May 30 08:08:26 ... NetworkManager[1477]: <info>  [1653898106.6475] manager: (veth5b49c1e): new Veth device (/org/freedesktop/NetworkManager/Devices/49)
May 30 08:08:26 ... NetworkManager[1477]: <info>  [1653898106.6484] manager: (veth7a9e231): new Veth device (/org/freedesktop/NetworkManager/Devices/50)
May 30 08:08:26 ... containerd[7086]: time="2022-05-30T08:08:26.705213872Z" level=info msg="loading plugin \"io.containerd.event.v1.publisher\"..." runtime=io.containerd.runc.v2 type=io.containerd.event.v1
May 30 08:08:26 ... containerd[7086]: time="2022-05-30T08:08:26.705356803Z" level=info msg="loading plugin \"io.containerd.internal.v1.shutdown\"..." runtime=io.containerd.runc.v2 type=io.containerd.internal.v1
May 30 08:08:26 ... containerd[7086]: time="2022-05-30T08:08:26.705372603Z" level=info msg="loading plugin \"io.containerd.ttrpc.v1.task\"..." runtime=io.containerd.runc.v2 type=io.containerd.ttrpc.v1
May 30 08:08:26 ... containerd[7086]: time="2022-05-30T08:08:26.705672982Z" level=info msg="starting signal loop" namespace=moby path=/run/containerd/io.containerd.runtime.v2.task/moby/73bde052a7e98e21bdbfdc26a2c546ed794f3d47392465e24cd9d99cabd2682b pid=18150 runtime=io.containerd.runc.v2
May 30 08:08:26 ... containerd[7086]: time="2022-05-30T08:08:26.749174378Z" level=info msg="shim disconnected" id=73bde052a7e98e21bdbfdc26a2c546ed794f3d47392465e24cd9d99cabd2682b
May 30 08:08:26 ... containerd[7086]: time="2022-05-30T08:08:26.749261601Z" level=warning msg="cleaning up after shim disconnected" id=73bde052a7e98e21bdbfdc26a2c546ed794f3d47392465e24cd9d99cabd2682b namespace=moby
May 30 08:08:26 ... containerd[7086]: time="2022-05-30T08:08:26.749283954Z" level=info msg="cleaning up dead shim"
May 30 08:08:26 ... containerd[7086]: time="2022-05-30T08:08:26.764100034Z" level=warning msg="cleanup warnings time=\"2022-05-30T08:08:26Z\" level=info msg=\"starting signal loop\" namespace=moby pid=18205 runtime=io.containerd.runc.v2\ntime=\"2022-05-30T08:08:26Z\" level=warning msg=\"failed to read init pid file\" error=\"open /run/containerd/io.containerd.runtime.v2.task/moby/73bde052a7e98e21bdbfdc26a2c546ed794f3d47392465e24cd9d99cabd2682b/init.pid: no such file or directory\" runtime=io.containerd.runc.v2\n"
May 30 08:08:26 ... containerd[7086]: time="2022-05-30T08:08:26.764472536Z" level=error msg="copy shim log" error="read /proc/self/fd/12: file already closed"
May 30 08:08:26 ... dockerd[7238]: time="2022-05-30T08:08:26.764888968Z" level=error msg="stream copy error: reading from a closed fifo"
May 30 08:08:26 ... dockerd[7238]: time="2022-05-30T08:08:26.764934519Z" level=error msg="stream copy error: reading from a closed fifo"
May 30 08:08:26 ... kernel: docker0: port 1(veth7a9e231) entered disabled state
May 30 08:08:26 ... kernel: device veth7a9e231 left promiscuous mode
May 30 08:08:26 ... kernel: docker0: port 1(veth7a9e231) entered disabled state
May 30 08:08:26 ... NetworkManager[1477]: <info>  [1653898106.8335] device (veth7a9e231): released from master device docker0
May 30 08:08:26 ... dockerd[7238]: time="2022-05-30T08:08:26.858083615Z" level=error msg="73bde052a7e98e21bdbfdc26a2c546ed794f3d47392465e24cd9d99cabd2682b cleanup: failed to delete container from containerd: no such container"
May 30 08:08:26 ... dockerd[7238]: time="2022-05-30T08:08:26.858120136Z" level=error msg="Handler for POST /v1.41/containers/73bde052a7e98e21bdbfdc26a2c546ed794f3d47392465e24cd9d99cabd2682b/start returned error: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: can't copy bootstrap data to pipe: write init-p: broken pipe: unknown"
May 30 08:08:27 ... systemd[2122]: vdb-lib-docker-overlay2-9dc3231215d95afd5b5d39f3621264c0a113e6c8d00198b8eb21ad5014a0b53d-merged.mount: Succeeded.
May 30 08:10:02 ... systemd[1]: Starting system activity accounting tool...
May 30 08:10:02 ... systemd[1]: sysstat-collect.service: Succeeded.
May 30 08:10:02 ... systemd[1]: Started system activity accounting tool.
May 30 08:10:37 ... systemd[1]: Starting Cleanup of Temporary Directories...
May 30 08:10:37 ... systemd[1]: systemd-tmpfiles-clean.service: Succeeded.
May 30 08:10:37 ... systemd[1]: Started Cleanup of Temporary Directories.
suyuyi commented 2 years ago

I found - CONFIG_NF_NAT_IPV4 is missing, but I don't know whether it is the reason and how to resolve it.

suyuyi commented 2 years ago

I used ip r: And I find 172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown was suspicious, but I don't know what it did to docker.

suyuyi commented 2 years ago

I thought it's fapolicyd caused this error. After sudo systemctl stop fapolicyd, I can run hello-world image normally. But it's still confuse me, why a few time when fapolicyd was started, I can executed docker run hello-world successfully.

rootwarrior commented 2 months ago

I thought it's fapolicyd caused this error. After sudo systemctl stop fapolicyd, I can run hello-world image normally. But it's still confuse me, why a few time when fapolicyd was started, I can executed docker run hello-world successfully.

I was having this same problem. I was trying to run a docker compose operation with 5 containers, and it would intermittently fail to run certain operations. I'd have to re-run the docker compose multiple times until all portions succeeded. And when I tried running hello-world alone, it'd intermittently fail as well.

Seeing your post, I tried disabling fapolicyd. The problem went away. Thanks @suyuyi!!