Open Enderer opened 6 years ago
I'm seeing the same issue, also with a Plex container, and I'm also bind-mounting a network share.
There are a few differences in my situation - I'm using the official Plex docker image, I'm using macvlan network, and I'm running it with docker-compose.
I'm seeing exactly the same symptoms though.
There are no application logs at all inside the container and no entries in the container logs either (docker-compose logs
)
The container starts normally if I do docker-compose up
.
The container also starts normally if I restart the docker daemon.
The issue only occurs at boot.
If I remove the bind-mounted network share, the container starts normally at boot, so it seems that the issue is the container tries to start before the network share has been mounted.
Therefore I'm not sure whether this constitutes a Docker bug to be honest.
Excerpt from my docker-compose.yml
version: '3.1'
services:
plex:
image: plexinc/pms-docker:plexpass
restart: unless-stopped
networks:
physical:
ipv4_address: 192.168.20.208
hostname: pms-docker
volumes:
- plex-config:/config
- plex-temp:/transcode
- /mnt/qnap2/multimedia:/media
devices:
- /dev/dri:/dev/dri
networks:
physical:
external: true
volumes:
plex-config:
plex-temp:
$ docker-compose ps
Name Command State Ports
---------------------------------
plex /init Exit 128
Error from docker inspect is the same as above:
"State": {
"Status": "exited",
"Running": false,
"Paused": false,
"Restarting": false,
"OOMKilled": false,
"Dead": false,
"Pid": 0,
"ExitCode": 128,
"Error": "OCI runtime create failed: container_linux.go:348: starting container process caused \"process_linux.go:402: container init caused \\\"rootfs_linux.go:58: mounting \\\\\\\"/mnt/qnap2/multimedia\\\\\\\" to rootfs \\\\\\\"/var/lib/docker/overlay2/2f7c5ceb2dd5ddb0788aa9272b600edef6a4a0edbf154f8963b7075552e7bd16/merged\\\\\\\" at \\\\\\\"/var/lib/docker/overlay2/2f7c5ceb2dd5ddb0788aa9272b600edef6a4a0edbf154f8963b7075552e7bd16/merged/mnt/qnap2/multimedia\\\\\\\" caused \\\\\\\"no such device\\\\\\\"\\\"\": unknown",
"StartedAt": "2018-06-14T15:43:24.199564037Z",
"FinishedAt": "2018-06-14T15:49:17.387003284Z",
"Health": {
"Status": "unhealthy",
"FailingStreak": 0,
}
}
I'm on a slightly later docker version and I'm on Ubuntu 18.04 LTS
$ docker version
Client:
Version: 18.05.0-ce
API version: 1.37
Go version: go1.9.5
Git commit: f150324
Built: Wed May 9 22:16:13 2018
OS/Arch: linux/amd64
Experimental: false
Orchestrator: swarm
Server:
Engine:
Version: 18.05.0-ce
API version: 1.37 (minimum version 1.12)
Go version: go1.9.5
Git commit: f150324
Built: Wed May 9 22:14:23 2018
OS/Arch: linux/amd64
Experimental: false
This issue can be reproduced with this basic container which bind-mounts a network share:
docker container run -d \
--restart=always \
--name testmount \
-v /mnt/qnap2/multimedia:/media \
busybox ping 8.8.8.8
It gives the same behaviour and same error after reboot.
$ docker inspect testmount -f '{{ .State.Error }}'
OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"rootfs_linux.go:58: mounting \\\"/mnt/qnap2/multimedia\\\" to rootfs \\\"/var/lib/docker/overlay2/4be5925b0d17e6c9c03ddf70ad7108ca184f3f7456599cdb6cfa08713a2af0f2/merged\\\" at \\\"/var/lib/docker/overlay2/4be5925b0d17e6c9c03ddf70ad7108ca184f3f7456599cdb6cfa08713a2af0f2/merged/media\\\" caused \\\"no such device\\\"\"": unknown
If I remove the network bind-mount, then it works and starts correctly after reboot:
docker container run -d \
--restart=always \
--name testnomount \
-v /tmp:/media \
busybox ping 8.8.8.8
Therefore the issue is simply that Docker is attempting to start the container before the mount has completed.
I don't think this can be considered a Docker bug - how is Docker daemon supposed to know to wait for the network mount?
I suspect the fix on a case by case basis is to add an After=
rule to the systemd docker.service
file.
Fix for this is to add x-systemd.after=docker.service
to the fstab entry. This tells systemd that docker.service shouldn't be started until after the mount has been done.
If the mount fails, the docker server will start as normal.
~Just for info my full working entry from `/etc/fstab/ is:~~
//qnap2/multimedia /mnt/qnap2/multimedia cifs uid=andym,x-systemd.automount,x-systemd.after=docker.service,credentials=/home/username/.smbcredentials,iocharset=utf8 0 0
I spoke too soon. The above does allow the container to start, but the share isn't actually mounted. The above should not be used.
A working fix is to modify the docker /lib/systemd/system/docker.service
file. Add RequiresMountsFor=/mnt/qnap2/multimedia
to the [Unit]
section.
See https://www.freedesktop.org/software/systemd/man/systemd.unit.html#RequiresMountsFor=
This is not ideal since it requires modifying the Docker service each time a container needs a mount, but it does the job.
It seems this is actually a recurrence of a previous issue https://github.com/moby/moby/issues/17485
Repro steps are nearly identical, apart from different mount type.
I encounter the same issue. Even though the restart policy of my containers is set to unless-stopped, they don't come up if one of the prerequisite mount points are not available at the time Docker attemtps to start them. The retry logic (which otherwise works fine) is not executed. The status is:
"ExitCode": 255,
"Error": "OCI runtime create failed: container_linux.go:348: starting container process caused \"process_linux.go:402: container ini
t caused \\\"rootfs_linux.go:58: mounting \\\\\\\"...\\\\\\\" to rootfs \\\\\\\"/var/
lib/docker/overlay2/.../merged\\\\\\\" at \\\\\\\"...\\\\\\\" ca
used \\\\\\\"stat ...: no such file or directory\\\\\\\"\\\"\": unknown",
Yep, struggling with this as well at the moment. NFS mount is not setup before Docker starts, so the container doesn't work as expected.
hello, same here, but I only use docker volume on the same server with docker-compose, need to restart every project each time.
Why docker is not trying to restart this container?
Same issue here. If the CIFS share is not mounted, container exits and does not attempt to restart. Container will start fine when started manually once the network share is available.
Something similar happens in my case. I've got an encrypted folder in Synology, with automount enabled. Since it's not mounted yet when the docker service starts, it doesn't start until I manually do it with docker-compose up
or using the Synology UI. It doesn't retry even with restart: always
set.
Result from docker-compose ps
:
Name Command State Ports
------------------------------------------------------------
test_bck /entry.sh supervisord -n - ... Exit 128
This is really annoying, since I only use my Synology NAS several hours a day... and I need to start some docker services automatically.
I see the same issue. My docker-paths are directly mapped to the filesystem of locally attched SSDs.
And in some cases after reboot the containers show Exit 128 and docker does not try to restart them, although restart: always
is used.
When I check systemctl status docker
, I can see that the docker service is running, but reports "id already in use"
Docker version 18.09.1, build 4c52b90
docker-compose version 1.23.2, build 1110ad01
Is there a way to force docker to restart the services in this case?
How is this not fixed??? This is extremely annoying, isn't it?
I gathered extra information for my case: My docker-compose file:
plex:
image: linuxserver/plex
container_name: plex
runtime: nvidia
environment:
...
Ths output of docker inspect:
[
{
"Id": "c12c4d426f8f36848fbe1e4807a46cbd570be56b2534768cfc75e76e03b0e083",
"Created": "2019-11-24T19:53:46.006747643Z",
"Path": "/init",
"Args": [],
"State": {
"Status": "exited",
"Running": false,
"Paused": false,
"Restarting": false,
"OOMKilled": false,
"Dead": false,
"Pid": 0,
"ExitCode": 128,
"Error": "error gathering device information while adding custom device \"/dev/nvidia-modeset\": no such file or directory",
"StartedAt": "2019-11-25T08:48:31.115776398Z",
"FinishedAt": "2019-11-25T08:55:31.358738772Z"
},
...
And my /lib/systemd/system/docker.service :
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
BindsTo=containerd.service
After=network-online.target firewalld.service containerd.service
Wants=network-online.target
Requires=docker.socket
RequiresMountsFor=/zdata/media /dev/nvidia0 /dev/nvidiactl /dev/nvidia-uvm /dev/nvidia-uvm-tools /dev/nvidia-modeset
Is there a way to wait for the nvidia driver to be properly loaded other than with "RequiresMountsFor"??
Same issue occurred today after running yum update
in an AWS server, but unfortunately I already started the container, thus I cannot inspect it anymore to see more details.
In my case the container is from the official image for Traefik and was having restart set to always
, and also some volumes, being on of them /var/run/docker.sock
:
version: '2'
services:
traefik:
image: traefik:1.7
restart: always
ports:
- 80:80
- 443:443
networks:
- traefik
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /opt/traefik/traefik.toml:/traefik.toml
- /opt/traefik/acme.json:/acme.json
container_name: traefik
networks:
traefik:
external: true
Anyone from docker can comment on this issue?
Maybe @andrewhsu, @tiborvass, @thaJeztah or @duglin can help in pointing this issue to anyone that can give a hand here.
I had this exact situation. I start my containers using --restart unless-stopped
. At some point I updated/upgraded the server (Ubuntu) and then rebooted it. A couple of hours after the reboot, most containers stopped, with Exited (128)
.
$ docker container list --all
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
9f843f571a17 jrcs/letsencrypt-nginx-proxy-companion "/bin/bash /app/entr…" 5 months ago Up 4 hours letsencrypt
2e2daceaa70b proxy "/app/docker-entrypo…" 5 months ago Up 4 hours 0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp proxy
5882d5240bbe foo3 "nginx -g 'daemon of…" 12 months ago Exited (128) 4 hours ago 80/tcp foo5
ace272f67536 foo3 "nginx -g 'daemon of…" 12 months ago Exited (128) 4 hours ago 80/tcp foo4
f89af68a44d6 foo3 "nginx -g 'daemon of…" 12 months ago Exited (128) 4 hours ago 80/tcp foo3
42be6050e8f2 foo2 "nginx -g 'daemon of…" 12 months ago Exited (128) 4 hours ago 80/tcp foo2
5043b220370f foo1 "nginx -g 'daemon of…" 12 months ago Exited (128) 4 hours ago 80/tcp foo1
After another reboot everything was fixed. Any ideas on why did this happen, or where should I look at to debug the situation?
A quick glance at the errors mentioned, it looks like all cases are trying to do a bind-mount of an extra disk that is not available yet the moment that docker starts, as commented above as well https://github.com/docker/for-linux/issues/293#issuecomment-397398781
runtime create failed: container_linux.go:348:
starting container process caused process_linux.go:402:
container init caused "rootfs_linux.go:58:
mounting "/mnt/tanagra/public"
to rootfs "/var/lib/docker/overlay2/6a990b540b574977de4d0b6197b3b033e4ab6890813eb592058d005db70337be/merged"
at "/var/lib/docker/overlay2/6a990b540b574977de4d0b6197b3b033e4ab6890813eb592058d005db70337be/merged/tanagra/public"
caused "no such device" "": unknown"
OCI runtime create failed: container_linux.go:348:
starting container process caused process_linux.go:402:
container init caused "rootfs_linux.go:58:
mounting "/mnt/qnap2/multimedia"
to rootfs "/var/lib/docker/overlay2/2f7c5ceb2dd5ddb0788aa9272b600edef6a4a0edbf154f8963b7075552e7bd16/merged"
at "/var/lib/docker/overlay2/2f7c5ceb2dd5ddb0788aa9272b600edef6a4a0edbf154f8963b7075552e7bd16/merged/mnt/qnap2/multimedia"
caused "no such device" "": unknown"
I think the reason the daemon might not continue trying is that it requires the container to start successfully "once", before it will start monitoring the container (to handle restarting the container once it exits). I seem to recall this was done to prevent situations where (e.g. similar to what's discussed here) a "broken" container configuration causing a DOS of the whole daemon.
Perhaps the best solution is to create a systemd drop-in file to delay starting the docker service until after the required mounts are present, similar to https://github.com/containerd/containerd/pull/3741
I see this thread on reddit https://www.reddit.com/r/linuxadmin/comments/5z819x/how_to_have_a_systemd_service_wait_for_a_network/ also mentions global.mount
and remote-fs.target
, which may be relevant for the NFS shares
My "solution" so far is to create a cron
job and let that restart the container until the mounted drive is available:
SHELL=/snap/bin/pwsh
@reboot root <path>/autorestart.ps1
copy that script to /etc/cron.d
.
autorestart.ps1
is a poweshell script but that may be replaced easily by another script. The content is:
$isRunning = (docker inspect -f '{{.State.Running}}' <mycontainer>) | Out-String
while ($isRunning.TrimEnd() -ne "true")
{
"Container is not running. Starting container ..."
docker container start <mycontainer>
Start-Sleep -Seconds 10
$isRunning = (docker inspect -f '{{.State.Running}}' <mycontainer>) | Out-String
}
"Done."
I am experiencing this same issue on Ubuntu 20.04 (and just upgraded to 21, same issue) using systemd. The shares in question are from virtualbox. My containers start up fine as they have access to their application configuration on /home, however they cannot access the shares for the data they need to function. The container actually bind to the directory under the mount point and use up ghost space on root (which was very tricky to track down).
I have tried the RequiresMountsFor directive but it does not resolve the issue.
I had the same trouble with a simple docker compose file for loki without any remote folders. It semed to fail for just mounting a local file quoting something about mounting through proc.
I therefore created my own systemd startup file for docker, which seems to run now even iv rebooted:
I changed/added these two lines:
Requires=docker.socket containerd.service local-fs.target
RequiresMountsFor=/proc
Full file for reference is here:
root@logger:/etc/systemd/system# cat docker.service
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network-online.target firewalld.service containerd.service
Wants=network-online.target
Requires=docker.socket containerd.service local-fs.target
RequiresMountsFor=/proc
[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
ExecReload=/bin/kill -s HUP $MAINPID
TimeoutSec=0
RestartSec=2
Restart=always
# Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229.
# Both the old, and new location are accepted by systemd 229 and up, so using the old location
# to make them work for either version of systemd.
StartLimitBurst=3
# Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230.
# Both the old, and new name are accepted by systemd 230 and up, so using the old name to make
# this option work for either version of systemd.
StartLimitInterval=60s
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
# Comment TasksMax if your systemd version does not support it.
# Only systemd 226 and above support this option.
TasksMax=infinity
# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes
# kill only the docker process, not all processes in the cgroup
KillMode=process
OOMScoreAdjust=-500
[Install]
WantedBy=multi-user.target
systemctl status docker
you resolved these issue ???
comment please
I don't remember the details as In no longer use virtualboz but I solved this by changing the systemd priorities. I think I held docker back until the auto mount was complete or I put a sleep in a startup script. I'm sorry In can't remember the details but the solution lies in systemd
I have this issue on a local bind mount, not a network share, so it's definitely not just that situation. Only one container does this. I'm not sure why. I have restart=always on it, still doesn't retry.
Experiencing the same problem with linuxserver/tvheadend
, just a local bind volume for recordings.
Ubuntu 22.04.3 LTS
Having the same issue on Debian 12 and Vaultwarden - local binds only. Unfortunately, the fix suggested by @kkretsch did not work.
Oddly, I have both Vaultwarden and vaultwarden-backup in the same compose file, binding the same local directory (vaultwarden-backup has two additional unrelated binds) - yet only Vaultwarden 128s every reboot; the other container start up just fine.
On a separate host (Debian 11), I'm having the same issue with Traefik (sporadically, by contrast) . In this case as well, multiple additional containers are sharing a common local bind. However, testing without multiple containers binding a common directory yields inconsistent results for me.
Ubuntu 22.04, Docker 25.0.0, build e758fe5, this is still an issue. For me it happens with any container, that has restart=always.
I have the same issue with Ubuntu 22.04.1 Docker Version 24.0.5. Any solution?
Just going to throw my "I have the same issue" out there. This is incredibly frustrating...
I've also tried mounting the drive via /etc/fstab
, but if a docker container references it, even with RequiresMountsFor=/some/path
in the systemd config, it causes the drive mount to fail. I've confirmed this by removing the container and rebooting, and the drive will mount fine, but restarting the container and rebooting, it fails to mount again. I'm at a complete loss....
The only work-around I have found is to delay docker from starting.
sudo systemctl edit docker.service
Add
### Editing /etc/systemd/system/docker.service.d/override.conf
### Anything between here and the comment below will become the new contents of the file
[Service]
ExecStartPre=/bin/sleep 30
....
This isn't a foolproof fix though, there is definitely still a chance things will fail to load properly.
That's how I solved it: https://gitlab.com/-/snippets/3715249
Just going to throw my "I have the same issue" out there. This is incredibly frustrating...
I've also tried mounting the drive via
/etc/fstab
, but if a docker container references it, even withRequiresMountsFor=/some/path
in the systemd config, it causes the drive mount to fail. I've confirmed this by removing the container and rebooting, and the drive will mount fine, but restarting the container and rebooting, it fails to mount again. I'm at a complete loss....The only work-around I have found is to delay docker from starting.
sudo systemctl edit docker.service
Add
### Editing /etc/systemd/system/docker.service.d/override.conf ### Anything between here and the comment below will become the new contents of the file [Service] ExecStartPre=/bin/sleep 30 ....
This isn't a foolproof fix though, there is definitely still a chance things will fail to load properly.
Truly wonderful fix. I was distro-hopping and ended up with OpenSUSE. It uses NetworkManager by default, and I assume it has a delay with DHCP or something else, causing Pi-hole to exit with code 128, because I bind port 53 to the host IP. The error message I got was
"Error": "driver failed programming external connectivity on endpoint pihole
(5ae32e1e4aeee78efab94c2d638e29d918eeef7c355b95c907f7121f293c080a):
Error starting userland proxy: listen tcp4 172.20.0.20:53: bind: cannot assign requested address",
"StartedAt": "2024-08-02T04:42:00.906011788Z",
"FinishedAt": "2024-08-02T04:54:18.75070572Z",
I used your code, with a delay of 10 sec only, and it worked flawlessly.
[Service]
ExecStartPre=/bin/sleep 10
Thank you
Encountered the issue recently for an nginx container that has no volumes (named or anonymous).
Given containers may use /var/lib/docker/containers
as well, and that directory structure might remain empty if storage is not populated, adding the below on top of ConditionDirectoryNotEmpty
might resolve this issue.
[Unit]
RequiresMountsFor=/var/lib/docker
ConditionDirectoryNotEmpty=/var/lib/docker/containers /var/lib/docker/volumes/
Actual behavior
After rebooting the server the container does not start back up. The container tries to start but exists with code 128. This looks like its due to the network volume not being available at the time of startup. It takes a few seconds before the volume is ready. The message "no such device" appears in the error log. Manually starting the container works because the network volume is then available.
The container is set to restart=always but Docker does not attempt to restart the container. RestartCount is 0.
Here is the docker command:
Here is the error message from docker inspect:
Output of
docker version
:Output of
docker info
: