Closed mxork closed 3 years ago
This is complicated to do right in the face of the new netlink configuration interface.
Why do you need to track this, though? I might be missing something, but I don't see the need...?
Fair question. This is an XY problem, and I should explain the root. Sorry this is a bit long. To the best of my knowledge, this is a minimal example of a genuine issue.
I have the following situation: a service depends on having a partition mounted, and the partition is located on a network block device. I hope you would agree that this is a reasonable use case for NBD, and that it would be nice for systemd to handle the dependencies.
Here are the files:
# /usr/lib/systemd/system/echo.service
[Unit]
Requires=mnt.mount
After=mnt.mount
[Service]
Type=oneshot
ExecStart=/bin/echo "Very serious system service!"
# /usr/lib/systemd/system/mnt.mount
[Mount]
What=/dev/nbd0p1
Where=/mnt
Type=ext4
# /etc/nbdtab
nbd0 storage.server default
Having run systemctl enable nbd@nbd0; systemctl start echo
, I expect:
storage.server:default
is bound to /dev/nbd0
/dev/nbd0p1
is mounted to /mnt
echo.service
is startedHowever this is not the observed behavior. Instead:
$ systemctl start echo
A dependency job for echo.service failed. See 'journalctl -xe' for details.
$ journalctl -xe
-- Subject: Unit echo.service has failed
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit echo.service has failed.
--
-- The result is dependency.
Nov 22 19:58:04 peacenik systemd[1]: echo.service: Job echo.service/start failed with result 'dependency'.
Nov 22 19:58:04 peacenik systemd[1]: mnt.mount: Job mnt.mount/start failed with result 'dependency'.
Nov 22 19:58:04 peacenik systemd[1]: dev-nbd0p1.device: Job dev-nbd0p1.device/start failed with result 'dependency'.
Nov 22 19:58:15 peacenik systemd[1]: Starting NBD client connection for nbd0...
-- Subject: Unit nbd@nbd0.service has begun start-up
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit nbd@nbd0.service has begun starting up.
Nov 22 19:58:16 peacenik nbd-client[832]: Negotiation: ..size = 32768MB
Nov 22 19:58:16 peacenik nbd-client[832]: Error: Failed to setup device, check dmesg
Nov 22 19:58:16 peacenik nbd-client[832]: Exiting.
Nov 22 19:58:16 peacenik nbd_client[832]: Failed to setup device, check dmesg
Nov 22 19:58:16 peacenik nbd_client[832]: Exiting.
Nov 22 19:58:16 peacenik systemd[1]: nbd@nbd0.service: Control process exited, code=exited status=1
Nov 22 19:58:16 peacenik systemd[1]: nbd@nbd0.service: Failed with result 'exit-code'.
Nov 22 19:58:16 peacenik kernel: nbd: nbd0 already in use
As you can see, the service fails to start due to a failed dependency (the mount), which fails to start due to a failed dependency (the partition device).
Checking dependencies of dev-nbd0p1.device
:
$ systemctl list-dependencies dev-nbd0p1.device
dev-nbd0p1.device
● └─nbd@nbd0.service
So, from systemd's perspective, nbd@nbd0.service
failed to run. From the output above (of journalctl
) you can see that the error is kernel: nbd: nbd0 already in use
. Huh? But I thought it failed to connect nbd0?!
The cause is that nbd@nbd0.service
successfully starts, binds /dev/nbd0
and forks, but systemd cannot tell that is has succeeded or is running, and this prevents dependencies on nbd@nbd0
from starting (the service state is inactive (dead)
).
Eventually, it seems, systemd attempts to start nbd@nbd0
again, but this time it fails loudly, since /dev/nbd0
is already bound. This, in turn, prevents any dependencies of nbd@nbd0
from starting. If we just go ahead and run:
$ systemctl start echo.service
$ journalctl -eu echo
Nov 22 20:11:36 peacenik systemd[1]: Starting echo.service...
Nov 22 20:11:36 peacenik echo[1156]: Very serious system service!
Nov 22 20:11:36 peacenik systemd[1]: Started echo.service.
Everything runs just fine. There is no issue of functionality, only communicating the state of the device properly to systemd.
My idea of the root issue is that Type=forking
is used inappropriately for nbd@.service
. From man systemd.service
:
If set to forking, it is expected that the process configured with ExecStart= will call fork() as part of its start-up. The parent process is expected to exit when start-up is complete and all communication channels are set up. The child continues to run as the main daemon process. This is the behavior of traditional UNIX daemons. If this setting is used, it is recommended to also use the PIDFile= option, so that systemd can identify the main process of the daemon. systemd will proceed with starting follow-up units as soon as the parent process exits.
(emphasis mine)
Clearly, systemd expects to be reponsible for managing the main process of the service (or at least being able to keep track of it), and since it cannot, that causes merry hell going down the dependency chain.
My initial thought (and the issue I opened) was to just add a -pidfile
flag so systemd can track the process. After consideration, that is not the only or simplest way to fix this. Converting nbd@.service
to Type=oneshot
with a RemainAfterExit=yes
is probably the closest thing which matches the existing behavior: the service is fire-and-forget on success, with no process tracking. Unfortunately, if the device is disconnected for any reason, then nbd@.service
will remain active, and cannot be re-run by a dependent unit.
Another option is to use the sd_notify
/systemd-notify
mechanism to monitor the status of the device. This is a little more complicated, but keeps the state of the service in sync with the state of the device. I am currently using a shell wrapper of nbd-client
which invokes systemd-notify
as a workaround to this issue.
$ uname -r
4.19.2-arch1-1-ARCH
$ nbd-client --version
This is nbd-client, from nbd 3.18
$ systemctl --version
systemd 239
+PAM +AUDIT -SELINUX -IMA -APPARMOR +SMACK -SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid
Ah, yes.
nbd-client
originally used the ioctl() interface, where it would hang on the NBD_DO_IT
ioctl(). At the time, having nbd-client be Type=forking
was the correct thing to do.
However, recent versions of nbd-client
use the netlink interface to configure the NBD device, which means that nbd-client
exits after configuring the device. I forgot that this will obviously have side effects in the systemd unit.
You can work around the issue by adding -nonetlink
to the command line in the systemd unit, while I try to come up with a proper solution :-)
This seemed to work for me:
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/sbin/nbd-client %i
ExecStop=/usr/sbin/nbd-client -d /dev/%i
For some reason I also had to make my mount unit depend explicitly on nbd@nbd0
:
[Unit]
Requires=nbd@nbd0.service
[Mount]
What=/dev/nbd0
Where=/mnt
Type=ext4
Hey guys,
I'm trying to make nbd-client
to work on system's boot but it isn't. I'm seeing a very similar problem/behaviour described here on this issue.
My system is Ubuntu 20.04.1 with Linux 5.4.
I tried both suggestions, the -nonetlink
and the Type=oneshot
, none worked on boot but they're okay after boot, with systemctl restart nbd@nbd0
for example.
As a workaround, changed the /lib/systemd/system/nbd@.service
to include the -L
/ -nonetlink
to satisfy original system unit service AND I also added a /etc/rc.local
script with:
#! /bin/bash
systemctl restart nbd@nbd0
systemctl restart nbd@nbd1
So, now, it works on boot!
Cheers! Thiago
It turns out that the oneshot
idea works better!
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/sbin/nbd-client %i
ExecStop=/usr/sbin/nbd-client -d /dev/%i
The -L
/-nonetlink
is kinda creepy... lol
I wrote up my working instructions here: https://libguestfs.org/nbdkit-client.1.html#Easy-mounting-at-boot-time
Tracking the status of an nbd-client process from systemd is annoying.
nbd-client -check
allows interactive determination of the PID, but life would be a easier with that information in a pidfile.