NetworkBlockDevice / nbd

Network Block Device
GNU General Public License v2.0
452 stars 119 forks source link

nbd root on ubuntu 16.04 fails due to systemd shutdown sending a kill(-1,SIGSTOP) #51

Closed MartinezTorres closed 6 years ago

MartinezTorres commented 7 years ago

Hi,

I'm trying to setup nbd root on ubuntu 16.04, which uses systemd for system management. Everything works well and as expected, but the partition is never cleanly closed on shutdown. I've tracked the problem to a kill(-1,SIGSTOP) signal that is sent by systemd-shutdown, before the root is mounted read only.

THe problem is related to the one discussed in this thread 6 years ago: https://sourceforge.net/p/nbd/mailman/message/27368126/

I'd like to avoid hacking systemd to solve this, or else the maintenance of the images will become very messy...

yoe commented 7 years ago

The nbd-client which backs your root filesystem should be started from the initramfs with the -m option. Most initramfs tools that I'm aware of should do this automatically these days.

Which initramfs implementation are you using?

MartinezTorres commented 7 years ago

THis is not the problem. I start nbd with -m and so argv[0][0]='@'. Sadly systemd-shutdown never reaches the check.

Check the code at: https://github.com/systemd/systemd/blob/master/src/core/killall.c

shutdown calls broadcast_signal, (at killall:224) the nbd socket dies inside the broadcast_signal function (particularly, in line 235, the kill(-1,SIGSTOP)) the check for argv[0][0]='@' happens inside the ignore_proc, that happens inside the killall, and actually is never reached (at killall:238).

yoe commented 7 years ago

That sounds like a bug in systemd then, not nbd-client...

MartinezTorres commented 7 years ago

Mmm, I think that nbd-client should support receiving a SIGSTOP followed by a SIGCONT, doesn't it?

This sample code kills the nbd socket:

include <sys/types.h>

include

int main() {

kill(-1, SIGSTOP); kill(-1, SIGCONT); }

yoe commented 7 years ago

So, I'm not sure that's true. We told systemd not to send a signal to us. To me, that means not any signal. I'll take it up with them.

If I can't convince them though, this is being processed in the kernel though, so would be more something for @josefbacik to look at. I suspect he'll say this is fixed in the netlink interface though ;-)

tbillon commented 7 years ago

@MartinezTorres did you find a workaround ? We face the exact same problem.

MartinezTorres commented 7 years ago

Hi, we ended up with an ugly solution: we patched systemd in ubuntu 16.04. We then locked out automated systemd updates.. which is a safety concern.

here is our patch:

--- killall.c   2016-02-11 17:28:00.000000000 +0100
+++ ../../../systemd-229.good/src/core/killall.c    2017-06-10 19:48:38.881633073 +0200
@@ -234,13 +234,15 @@
         assert_se(sigaddset(&mask, SIGCHLD) == 0);
         assert_se(sigprocmask(SIG_BLOCK, &mask, &oldmask) == 0);

-        if (kill(-1, SIGSTOP) < 0 && errno != ESRCH)
-                log_warning_errno(errno, "kill(-1, SIGSTOP) failed: %m");
+//        if (kill(-1, SIGSTOP) < 0 && errno != ESRCH)
+//                log_warning_errno(errno, "kill(-1, SIGSTOP) failed: %m");

+        killall(SIGSTOP, pids, false);
         killall(sig, pids, send_sighup);
+        killall(SIGCONT, pids, false);

-        if (kill(-1, SIGCONT) < 0 && errno != ESRCH)
-                log_warning_errno(errno, "kill(-1, SIGCONT) failed: %m");
+//        if (kill(-1, SIGCONT) < 0 && errno != ESRCH)
+//                log_warning_errno(errno, "kill(-1, SIGCONT) failed: %m");

         if (wait_for_exit)
                 wait_for_children(pids, &mask);
alkisg commented 6 years ago

Noob idea, from what I hear nbd-client is actually a kernel thread, can it be launched with PPID=0 so that it's automatically excluded from a kill(-1) call?

yoe commented 6 years ago

It actually isn't, and it is.

The nbd-client runs ioctl(NBD_DO_IT), which is a system call that remains in kernel space until the device disconnects. It's an ugly hack which sortof works, but has issues (as this bug report shows).

The proper fix is to not do that anymore. The code for that exists, the next version of nbd-client will have it. There is just a bit of cleanup necessary before I can release that, but I haven't had the time for that yet. Hopefully soon.

alkisg commented 6 years ago

Thank you Wouter! The workaround I ended up using, is to overwrite /lib/systemd/systemd-shutdown with this simple shell script:

#!/bin/sh
# Work around https://github.com/NetworkBlockDevice/nbd/issues/51
case "$1" in
    reboot) reboot -f ;;
    poweroff) poweroff -f ;;
    halt) halt -f ;;
esac
avollmerhaus commented 6 years ago

Thanks for the workaround @alkisg, works fine here. Can't wait for the fix to be released.

josefbacik commented 6 years ago

This is hard from a kernel perspective, the fact is we have always responded to the signals and thus must continue to do that. The best way to deal with this is as Wouter says, use the new netlink interface, that way there's no disconnect unless you specifically tell it to disconnect.

yoe commented 6 years ago

So, with 3.17 (and the fixes in 3.18), nbd-client now defaults to using the netlink interface to configure this, and then it shouldn't be a problem anymore.

Since fixing the ioctl() interface is complicated at best, there isn't much more we can do, so I'm going to close the bug report now.

davidlt commented 2 years ago

I am having this same issue. I am using OE/Yocto with BusyBox rootfs (no systemd there). I launch nbd-client with -m to do systemd mark. I check that and it's all fine, i.e. the process name starts with @. During the shutdown systemd sends SIGSTOP to all processes and that terminates NBD connection. That results in a few IO errors.

[..]
[ 2202.642650] systemd-shutdown[1]: Syncing filesystems and block devices.
[ 2202.652773] systemd-shutdown[1]: Sending SIGTERM to remaining processes...
[ 2202.659477] block nbd0: shutting down sockets
[ 2202.677792] systemd-journald[323]: Received SIGTERM from PID 1 (systemd-shutdow).
[ 2202.701255] systemd-shutdown[1]: Sending SIGKILL to remaining processes...
[ 2202.721018] systemd-shutdown[1]: Unmounting file systems.
[ 2202.727787] [1516]: Remounting '/' read-only in with options '(null)'.
[ 2202.756039] blk_update_request: I/O error, dev nbd0, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0
[ 2202.765567] blk_update_request: I/O error, dev nbd0, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0
[ 2202.775991] blk_update_request: I/O error, dev nbd0, sector 4456448 op 0x1:(WRITE) flags 0x20800 phys_seg 1 prio class 0
[ 2202.786510] Buffer I/O error on dev nbd0, logical block 557056, lost sync page write
[ 2202.794257] JBD2: Error -5 detected when updating journal superblock for nbd0-8.
[ 2202.801643] Aborting journal on device nbd0-8.
[ 2202.806188] blk_update_request: I/O error, dev nbd0, sector 4456448 op 0x1:(WRITE) flags 0x20800 phys_seg 1 prio class 0
[ 2202.816928] Buffer I/O error on dev nbd0, logical block 557056, lost sync page write
[ 2202.824725] JBD2: Error -5 detected when updating journal superblock for nbd0-8.
[ 2202.832039] EXT4-fs (nbd0): re-mounted. Opts: (null). Quota mode: none.
[ 2202.850466] systemd-shutdown[1]: All filesystems unmounted.
[ 2202.855337] systemd-shutdown[1]: Deactivating swaps.
[ 2202.860402] systemd-shutdown[1]: All swaps deactivated.
[ 2202.865486] systemd-shutdown[1]: Detaching loop devices.
[ 2202.874626] systemd-shutdown[1]: All loop devices detached.
[ 2202.879481] systemd-shutdown[1]: Stopping MD devices.
[ 2202.884724] systemd-shutdown[1]: All MD devices stopped.
[ 2202.889805] systemd-shutdown[1]: Detaching DM devices.
[ 2202.895230] systemd-shutdown[1]: All DM devices detached.
[ 2202.900303] systemd-shutdown[1]: All filesystems, swaps, loop devices, MD devices and DM devices detached.
[ 2202.911221] blk_update_request: I/O error, dev nbd0, sector 1868984 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
[ 2202.921317] EXT4-fs error (device nbd0): __ext4_find_entry:1611: inode #4466: comm (sd-executor): reading directory lblock 0
[ 2202.932519] blk_update_request: I/O error, dev nbd0, sector 0 op 0x1:(WRITE) flags 0x23800 phys_seg 1 prio class 0
[ 2202.942752] Buffer I/O error on dev nbd0, logical block 0, lost sync page write
[ 2202.943349] systemd-shutdown[1]: Syncing filesystems and block devices.
[ 2202.950052] EXT4-fs (nbd0): I/O error while writing superblock
[ 2202.956680] systemd-shutdown[1]: Powering off.
[ 2203.272856] reboot: Power down

The rootfs does have systemd. NBD is at 2.23 version (latest).

yoe commented 2 years ago

That means you're not using the netlink interface, but the ioctl one. You shouldn't do that, but it will happen if the netlink library was not installed at compile time.

Please double check that you are compiled against it? If not, try that first.

davidlt commented 2 years ago

Ah, looking at Fedora package recipe and that's probably libnl3-devel. That's definitely not part of OpenEmbedded nbd recipe. Is there a list of all nbd dependencies somewhere, or should I looking at configure.ac is the best option?

rwmjones commented 2 years ago

https://src.fedoraproject.org/rpms/nbd/blob/rawhide/f/nbd.spec#_11

yoe commented 2 years ago

Probably the openembedded recipe hasn't been updated in a while; the libnl dependency is somewhat recent.

There isn't much else new beyond that.