archiecobbs / s3backer

FUSE/NBD single file backing store via Amazon S3
Other
538 stars 77 forks source link

Timeout on systemctl stop s3backer-nbd.service #220

Closed Lord-Dimwit-Flathead-the-Excessive closed 9 months ago

Lord-Dimwit-Flathead-the-Excessive commented 9 months ago

This one, I am not at all sure what is going on ... I have written a "wrapper" to manually start and stop the s3 device and it works fine. But if I do systemctl stop s3backer-nbd.service or reboot, I get a 90 second timeout and the mount object is left on the drive.

#! /usr/bin/bash
# /usr/local/bin/s3b-wrapper.sh
case "$1" in
   "--start")
      echo "Starting s3backer..."
      # "Run" directory (for the unix socket) is not created
      if [ ! -d /run/s3backer-nbd ]
      then
         mkdir /run/s3backer-nbd;
         chmod 700 /run/s3backer-nbd;
      fi
      /usr/bin/s3backer --nbd --configFile=/etc/s3backer.conf wasabi.bucket.test2 /dev/nbd0
      ;;
   "--stop")
      echo "Stopping s3backer..."
      vol=$(grep nbd0 /etc/fstab | awk '{print $2}')
      if [ "$(mount | grep $vol)" ] ; then sync ; umount $vol ; fi
      for i in $(ps -aux | awk '$11~/s3backer/{print $2}')
      do
         kill $i
      done
      [ -d /run/s3backer-nbd ] && rm -rf /run/s3backer-nbd
      ;;
   "--force")
      [ -d /run/s3backer-nbd ] || mkdir /run/s3backer-nbd
      /usr/bin/s3backer --nbd --force --configFile=/etc/s3backer.conf wasabi.bucket.test2 /dev/nbd0
      btrfs check /dev/nbd0 && mount /dev/nbd0
      ;;
   "--erase")
      /usr/bin/s3backer --erase --configFile=/etc/s3backer.conf wasabi.bucket.test2
      ;;
   *)
      echo "S3B: Invalid function!"
      ;;
esac
# /etc/s3backer.conf
--baseURL=https://s3.us-west-1.wasabisys.com/
--accessFile=/etc/s3backer.creds
--size=10T
--blockSize=1M
--listBlocks
--ssl
--encrypt
--passwordFile=/etc/s3backer.pswd
--listBlocksThreads=50
--timeout=90
# /usr/lib/systemd/system/s3backer-nbd.service
# systemd service file for running s3backer in NBD mode

[Unit]
Description=s3backer running in NBD mode
After=network-online.target
Wants=network-online.target
Documentation=https://github.com/archiecobbs/s3backer
Requires=systemd-modules-load.service

[Install]
WantedBy=multi-user.target

[Service]
Type=forking
ExecStart=-/usr/local/bin/s3b-wrapper.sh --start
ExecStop=-/usr/local/bin/s3b-wrapper.sh --stop

# Security hardening
#ProtectSystem=full
#ProtectHome=read-only
#ProtectHostname=true
#ProtectClock=true
#ProtectKernelTunables=true
##ProtectKernelModules=true
#ProtectKernelLogs=true
#ProtectControlGroups=true
#RestrictRealtime=true

Here's journctl:

Feb 14 11:57:48 nfs systemd[1]: Stopped target remote-fs.target - Remote File Systems.
Feb 14 11:57:48 nfs kernel: BTRFS info (device nbd0): last unmount of filesystem a401e8ad-a92e-4db8-a538-2b6c650da26a
Feb 14 11:57:48 nfs systemd[1]: Unmounting mnt-s3b.mount - /mnt/s3b...
Feb 14 11:57:49 nfs systemd[1]: mnt-s3b.mount: Deactivated successfully.
Feb 14 11:57:49 nfs systemd[1]: Unmounted mnt-s3b.mount - /mnt/s3b.
Feb 14 11:57:49 nfs systemd[1]: Stopping s3backer-nbd.service - s3backer running in NBD mode...
Feb 14 11:57:49 nfs s3b-wrapper.sh[2183]: Stopping s3backer...
Feb 14 11:59:19 nfs systemd[1]: s3backer-nbd.service: State 'stop-sigterm' timed out. Killing.
Feb 14 11:59:19 nfs systemd[1]: s3backer-nbd.service: Killing process 598 (nbdkit) with signal SIGKILL.
Feb 14 11:59:19 nfs systemd[1]: s3backer-nbd.service: Killing process 599 (nbdkit) with signal SIGKILL.
Feb 14 11:59:19 nfs systemd[1]: s3backer-nbd.service: Killing process 600 (nbdkit) with signal SIGKILL.
Feb 14 11:59:19 nfs systemd[1]: s3backer-nbd.service: Killing process 601 (nbdkit) with signal SIGKILL.
Feb 14 11:59:19 nfs systemd[1]: s3backer-nbd.service: Killing process 602 (nbdkit) with signal SIGKILL.
Feb 14 11:59:19 nfs systemd[1]: s3backer-nbd.service: Killing process 603 (n/a) with signal SIGKILL.
Feb 14 11:59:19 nfs systemd[1]: s3backer-nbd.service: Killing process 604 (nbdkit) with signal SIGKILL.
Feb 14 11:59:19 nfs systemd[1]: s3backer-nbd.service: Killing process 605 (nbdkit) with signal SIGKILL.
Feb 14 11:59:19 nfs systemd[1]: s3backer-nbd.service: Killing process 606 (nbdkit) with signal SIGKILL.
Feb 14 11:59:19 nfs systemd[1]: s3backer-nbd.service: Killing process 607 (nbdkit) with signal SIGKILL.
Feb 14 11:59:19 nfs systemd[1]: s3backer-nbd.service: Killing process 608 (nbdkit) with signal SIGKILL.
Feb 14 11:59:19 nfs systemd[1]: s3backer-nbd.service: Killing process 609 (nbdkit) with signal SIGKILL.
Feb 14 11:59:19 nfs systemd[1]: s3backer-nbd.service: Killing process 610 (n/a) with signal SIGKILL.
Feb 14 11:59:19 nfs systemd[1]: s3backer-nbd.service: Killing process 611 (n/a) with signal SIGKILL.
Feb 14 11:59:19 nfs systemd[1]: s3backer-nbd.service: Killing process 612 (nbdkit) with signal SIGKILL.
Feb 14 11:59:19 nfs systemd[1]: s3backer-nbd.service: Killing process 613 (n/a) with signal SIGKILL.
Feb 14 11:59:19 nfs systemd[1]: s3backer-nbd.service: Killing process 614 (n/a) with signal SIGKILL.
Feb 14 11:59:19 nfs systemd[1]: s3backer-nbd.service: Killing process 615 (n/a) with signal SIGKILL.
Feb 14 11:59:19 nfs systemd[1]: s3backer-nbd.service: Killing process 616 (n/a) with signal SIGKILL.
Feb 14 11:59:19 nfs systemd[1]: s3backer-nbd.service: Killing process 617 (n/a) with signal SIGKILL.
Feb 14 11:59:19 nfs systemd[1]: s3backer-nbd.service: Killing process 618 (n/a) with signal SIGKILL.
Feb 14 11:59:19 nfs systemd[1]: s3backer-nbd.service: Killing process 726 (n/a) with signal SIGKILL.
Feb 14 11:59:19 nfs systemd[1]: s3backer-nbd.service: Killing process 764 (nbdkit) with signal SIGKILL.
Feb 14 11:59:19 nfs systemd[1]: s3backer-nbd.service: Killing process 780 (nbdkit) with signal SIGKILL.
Feb 14 11:59:19 nfs systemd[1]: s3backer-nbd.service: Killing process 781 (n/a) with signal SIGKILL.
Feb 14 11:59:19 nfs systemd[1]: s3backer-nbd.service: Killing process 782 (nbdkit) with signal SIGKILL.
Feb 14 11:59:19 nfs systemd[1]: s3backer-nbd.service: Killing process 785 (nbdkit) with signal SIGKILL.
Feb 14 11:59:19 nfs systemd[1]: s3backer-nbd.service: Killing process 786 (n/a) with signal SIGKILL.
Feb 14 11:59:19 nfs systemd[1]: s3backer-nbd.service: Killing process 787 (n/a) with signal SIGKILL.
Feb 14 11:59:19 nfs systemd[1]: s3backer-nbd.service: Killing process 788 (n/a) with signal SIGKILL.
Feb 14 11:59:19 nfs systemd[1]: s3backer-nbd.service: Killing process 789 (n/a) with signal SIGKILL.
Feb 14 11:59:19 nfs systemd[1]: s3backer-nbd.service: Killing process 790 (nbdkit) with signal SIGKILL.
Feb 14 11:59:19 nfs systemd[1]: s3backer-nbd.service: Killing process 791 (n/a) with signal SIGKILL.
Feb 14 11:59:19 nfs systemd[1]: s3backer-nbd.service: Killing process 792 (nbdkit) with signal SIGKILL.
Feb 14 11:59:19 nfs systemd[1]: s3backer-nbd.service: Killing process 793 (nbdkit) with signal SIGKILL.
Feb 14 11:59:19 nfs systemd[1]: s3backer-nbd.service: Killing process 794 (n/a) with signal SIGKILL.
Feb 14 11:59:19 nfs systemd[1]: s3backer-nbd.service: Killing process 795 (n/a) with signal SIGKILL.
Feb 14 11:59:19 nfs systemd[1]: s3backer-nbd.service: Killing process 796 (n/a) with signal SIGKILL.
Feb 14 11:59:19 nfs kernel: block nbd0: Receive control failed (result -32)
Feb 14 11:59:19 nfs systemd[1]: s3backer-nbd.service: Failed with result 'timeout'.
Feb 14 11:59:19 nfs systemd[1]: Stopped s3backer-nbd.service - s3backer running in NBD mode.
Feb 14 11:59:19 nfs systemd[1]: s3backer-nbd.service: Consumed 4.739s CPU time.
archiecobbs commented 9 months ago

OK first dumb question, what's wrong with the recipe given in the Network Block Device (NBD) Mode wiki page? Why is the wrapper script needed?

Even with systemd in control, you should still be able to manually stop and start using systemctl...

Lord-Dimwit-Flathead-the-Excessive commented 9 months ago

Merely a convenience until I get the service working. Remember, I don't have code that will create the /run/s3backer-nbd directory for the nbdkit. So without the wrapper on boot, s3backer fails with:

Feb 18 19:04:40 nfs s3backer[624]: /run/s3backer-nbd/0000000000000005_0000000000000236: No such file or directory

With it, startups are working fine.

It is only when I try to stop the service using systemctl (or shutdown/reboot) that I get the timeout and the s3 filesystem is left in a "mounted" state. This behavior is consistent whether I use

ExecStart=-/usr/local/bin/s3b-wrapper.sh --start
ExecStop=-/usr/local/bin/s3b-wrapper.sh --stop

or

ExecStart=/usr/bin/s3backer --nbd --force --configFile=/etc/s3backer.conf wasabi.bucket.test2 /dev/nbd0

in the service file.

But the filesystem cleanly shuts down if I umount /mnt/s3b;killall s3backer. It just doesn't seem to want to work from within systemd.

Tuesday I plan on rebuilding the target (Debian 12). Might be informative to try it on Ubuntu as well. This really feels like it has more to do with the OS than the code...

archiecobbs commented 9 months ago

OK thanks for clarifying. systemd is supposed to be smart enough to know that if you ask it to stop the service, it must unmount the filesystem first. This is what the x-systemd.requires=s3backer-nbd.service bit in /etc/fstab is for.

Lord-Dimwit-Flathead-the-Excessive commented 9 months ago

Yes. And I think that is actually working. At least journalctl indicates the volume is being dismounted well prior to the timeout.

It appears to me that killing s3backer behaves differently from within systemd than it does from the user space. But I want to "double check" this behavior on virgin builds of Debian and Ubuntu before you invest more time in it. It is entirely possible my current environment is corrupted. I have "played" a lot with it in my troubleshooting efforts to this point. I'll update Wednesday.

Lord-Dimwit-Flathead-the-Excessive commented 9 months ago

OK, on fresh rebuilds of Debian 12 (6.1.0-18-amd64) and Ubuntu 22.04.4 (5.15.0-94-generic) I am not getting any timeouts - everything is working great. Please disregard/close. Thank, Archie.

archiecobbs commented 9 months ago

OK great, thanks for the update.