archiecobbs / s3backer

FUSE/NBD single file backing store via Amazon S3
Other
535 stars 77 forks source link

Handling erroneous disconnect before upper ZFS pool destruction/export? #227

Closed stevekerrison closed 2 months ago

stevekerrison commented 2 months ago

I'm testing an s3backer --nbd implementation of a ZFS pool. During my experiments I killed s3backer, invalidating the NBD device before I destroyed the zpool sat on top of it. The end result is blocking I/O to that zpool and an inability to recover the system without a reboot. Note that this is a single-disk pool, no special device yet.

This seems mostly down to ZFS, as evident by this pull request and many associated issues, but isn't trivial to fix. It may also be mitigated by setting failmode=continue on the pool, which I'll test and report back on later.

However, I did wonder if there's anything in the interaction with NBD that can help. I cannot re-connect s3backer because /dev/nbd0 was not released properly. I'm not sure if there's any way to fix that after-the-fact, hence why I'm asking here.

I also see that s3backer doesn't detect this error condition, instead simply seeing it as the underlying client exiting.

Here's the dmesg output of the initial "failure" (user error):

# Killed s3 backer
[323882.298043] block nbd0: NBD_DISCONNECT
[323882.298070] block nbd0: Disconnected due to user request.
[323882.298074] block nbd0: shutting down sockets
# Tried to start new S3 backer
[323909.503608] nbd: nbd0 already in use
# S3 backer detects client exits, so tears itself down
[323909.510249] block nbd0: NBD_DISCONNECT
[323909.510262] block nbd0: Send disconnect failed -32
# My attempts to export/destroy the zpool triggered I/O errors
[323922.391985] I/O error, dev nbd0, sector 58728656 op 0x1:(WRITE) flags 0x0 phys_seg 2 prio class 2
[323922.391999] zio pool=testpool vdev=/dev/nbd0 error=5 type=2 offset=30069071872 size=4096 flags=1589376
[323922.392004] zio pool=testpool vdev=/dev/nbd0 error=5 type=2 offset=30069075968 size=4096 flags=1589376
[323922.392052] I/O error, dev nbd0, sector 528 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[323922.392059] zio pool=testpool vdev=/dev/nbd0 error=5 type=1 offset=270336 size=8192 flags=721089
[323922.392066] I/O error, dev nbd0, sector 1048574992 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[323922.392070] zio pool=testpool vdev=/dev/nbd0 error=5 type=1 offset=536870395904 size=8192 flags=721089
[323922.392073] I/O error, dev nbd0, sector 1048575504 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[323922.392076] zio pool=testpool vdev=/dev/nbd0 error=5 type=1 offset=536870658048 size=8192 flags=721089
[323922.392078] I/O error, dev nbd0, sector 67117264 op 0x1:(WRITE) flags 0x0 phys_seg 2 prio class 2
[323922.392082] zio pool=testpool vdev=/dev/nbd0 error=5 type=2 offset=34364039168 size=8192 flags=1074267264
[323922.392086] I/O error, dev nbd0, sector 92282920 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 2
[323922.392090] zio pool=testpool vdev=/dev/nbd0 error=5 type=2 offset=47248855040 size=4096 flags=1589376
[323922.392108] I/O error, dev nbd0, sector 8396928 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 2
[323922.392112] zio pool=testpool vdev=/dev/nbd0 error=5 type=2 offset=4299227136 size=4096 flags=1589376
[323922.392116] I/O error, dev nbd0, sector 41951440 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 2
[323922.392119] zio pool=testpool vdev=/dev/nbd0 error=5 type=2 offset=21479137280 size=4096 flags=1589376
[323922.392136] I/O error, dev nbd0, sector 50340048 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 2
[323922.392140] zio pool=testpool vdev=/dev/nbd0 error=5 type=2 offset=25774104576 size=4096 flags=1589376
[323922.392143] I/O error, dev nbd0, sector 58728672 op 0x1:(WRITE) flags 0x0 phys_seg 3 prio class 2
[323922.392147] zio pool=testpool vdev=/dev/nbd0 error=5 type=2 offset=30069080064 size=12288 flags=1074267264
[323922.392174] zio pool=testpool vdev=/dev/nbd0 error=5 type=2 offset=34364047360 size=12288 flags=1074267264
[323922.392178] zio pool=testpool vdev=/dev/nbd0 error=5 type=2 offset=51543887872 size=36864 flags=1074267264
[323922.392189] zio pool=testpool vdev=/dev/nbd0 error=5 type=2 offset=12889235456 size=4096 flags=1589376
[323922.392193] zio pool=testpool vdev=/dev/nbd0 error=5 type=2 offset=17184202752 size=4096 flags=1589376
[323922.392863] WARNING: Pool 'testpool' has encountered an uncorrectable I/O failure and has been suspended.
# Game over for testpool without a reboot, probably

All zpool commands that need to interact with the pool will end up in a hung task state and reported by the kernel as such after some time.

Manual attempts to disconnect the NBD with nbd-client -d /dev/nbd0 give:

block nbd0: NBD_DISCONNECT
block nbd0: Send disconnect failed -32

Because this zpool is single disk, the failure is catastrophic, and a reboot appears to be the only solution unless the drive magically comes back online, but it seems because I cannot manipulate /dev/nbd0 any more, that isn't possible without a reboot.

Are there any safeguards that can be built in at the s3backer level? I would imagine:

Mirror or RAID-z devices may behave in a less catastrophic way as I/O can continue on functioning block devices, although it would be interesting to see if /dev/nbdX nodes still get tied up. However, it doesn't make a lot of sense to use these modes over s3backer, as the object store is already handling durability/availability/etc and there's zero performance benefit.

Happy to do more controlled tests if needed. I haven't been in a position to reboot the affected test system yet.

stevekerrison commented 2 months ago

Realising the syslog is better than dmesg and having a bit of time for a couple of tests at lunch. From my journalctl, here's a "normal" dismount:

Aug 20 22:04:47 <REDACTED> systemd[1]: testpool.mount: Deactivated successfully.
Aug 20 22:04:50 <REDACTED> zed[1213339]: eid=1414 class=pool_destroy pool='testpool' pool_state=DESTROYED
Aug 20 22:04:51 <REDACTED> zed[1213350]: eid=1415 class=config_sync pool='testpool' pool_state=UNINITIALIZED
Aug 20 22:04:54 <REDACTED> nbdkit[1197411]: s3backer: unmount <REDACTED>: initiated
Aug 20 22:04:54 <REDACTED> nbdkit[1197411]: s3backer: unmount <REDACTED>: shutting down filesystem
Aug 20 22:04:54 <REDACTED> kernel: block nbd0: NBD_DISCONNECT
Aug 20 22:04:54 <REDACTED> kernel: block nbd0: Disconnected due to user request.
Aug 20 22:04:54 <REDACTED> kernel: block nbd0: shutting down sockets
Aug 20 22:04:54 <REDACTED> nbdkit[1197411]: s3backer: unmount <REDACTED>: clearing mount token
Aug 20 22:04:54 <REDACTED> nbdkit[1197411]: s3backer: unmount <REDACTED>: completed

Here's another dismount of an unused volume but with --debug enabled:

Aug 21 12:39:03 <REDACTED> s3backer[1720550]: s3backer: got signal 15
Aug 21 12:39:03 <REDACTED> s3backer[1720550]: s3backer: rec'd signal during sleep
Aug 21 12:39:03 <REDACTED> s3backer[1720550]: s3backer: shutting down s3backer NDB server
Aug 21 12:39:03 <REDACTED> s3backer[1720550]: s3backer: executing /usr//sbin/nbd-client with these parameters:
Aug 21 12:39:03 <REDACTED> s3backer[1720550]: s3backer:   [00] "/usr//sbin/nbd-client"
Aug 21 12:39:03 <REDACTED> s3backer[1720550]: s3backer:   [01] "-d"
Aug 21 12:39:03 <REDACTED> s3backer[1720550]: s3backer:   [02] "/dev/nbd1"
Aug 21 12:39:03 <REDACTED> s3backer[1720550]: s3backer: started /usr//sbin/nbd-client as process 1720619
Aug 21 12:39:03 <REDACTED> nbdkit[1720529]: s3backer: unmount <REDACTED>: initiated
Aug 21 12:39:03 <REDACTED> nbdkit[1720529]: s3backer: unmount <REDACTED>: shutting down filesystem
Aug 21 12:39:03 <REDACTED> s3backer[1720550]: s3backer: child process /usr//sbin/nbd-client (1720619) terminated normally
Aug 21 12:39:03 <REDACTED> nbdkit[1720529]: s3backer: unmount <REDACTED>: clearing mount token
Aug 21 12:39:03 <REDACTED> nbdkit[1720529]: s3backer: DELETE https://s3.ap-southeast-1.wasabisys.com/<REDACTED>/vdev2s3backer-mounted
Aug 21 12:39:03 <REDACTED> kernel: block nbd1: NBD_DISCONNECT
Aug 21 12:39:03 <REDACTED> kernel: block nbd1: Disconnected due to user request.
Aug 21 12:39:03 <REDACTED> kernel: block nbd1: shutting down sockets
Aug 21 12:39:03 <REDACTED> nbdkit[1720529]: s3backer: success: DELETE https://s3.ap-southeast-1.wasabisys.com/<REDACTED>/vdev2s3backer-mounted
Aug 21 12:39:03 <REDACTED> nbdkit[1720529]: s3backer: unmount <REDACTED>: completed

And here's one when the NBD is stuck and s3backer starts then stops itself:

Aug 21 12:39:53 <REDACTED> nbdkit[1721112]: s3backer: mounting <REDACTED>
Aug 21 12:39:53 <REDACTED> s3backer[1721133]: s3backer: successfully daemonized as process 1721133
Aug 21 12:39:53 <REDACTED> s3backer[1721133]: s3backer: executing /usr//sbin/nbd-client with these parameters:
Aug 21 12:39:53 <REDACTED> s3backer[1721133]: s3backer:   [00] "/usr//sbin/nbd-client"
Aug 21 12:39:53 <REDACTED> s3backer[1721133]: s3backer:   [01] "-unix"
Aug 21 12:39:53 <REDACTED> s3backer[1721133]: s3backer:   [02] "/run/s3backer-nbd/0000000000000005_00000000000003e8"
Aug 21 12:39:53 <REDACTED> s3backer[1721133]: s3backer:   [03] "-block-size"
Aug 21 12:39:53 <REDACTED> s3backer[1721133]: s3backer:   [04] "4096"
Aug 21 12:39:53 <REDACTED> s3backer[1721133]: s3backer:   [05] "-nofork"
Aug 21 12:39:53 <REDACTED> s3backer[1721133]: s3backer:   [06] "/dev/nbd0"
Aug 21 12:39:53 <REDACTED> s3backer[1721133]: s3backer: started /usr//sbin/nbd-client as process 1721134
Aug 21 12:39:53 <REDACTED> nbd_client[1721134]: Failed to setup device, check dmesg
Aug 21 12:39:53 <REDACTED> nbd_client[1721134]: Exiting.
Aug 21 12:39:53 <REDACTED> kernel: nbd: nbd0 already in use
Aug 21 12:39:53 <REDACTED> s3backer[1721133]: s3backer: child process /usr//sbin/nbd-client (1721134) terminated with exit value 1
Aug 21 12:39:53 <REDACTED> s3backer[1721133]: s3backer: shutting down s3backer NDB server
Aug 21 12:39:53 <REDACTED> s3backer[1721133]: s3backer: executing /usr//sbin/nbd-client with these parameters:
Aug 21 12:39:53 <REDACTED> s3backer[1721133]: s3backer:   [00] "/usr//sbin/nbd-client"
Aug 21 12:39:53 <REDACTED> s3backer[1721133]: s3backer:   [01] "-d"
Aug 21 12:39:53 <REDACTED> s3backer[1721133]: s3backer:   [02] "/dev/nbd0"
Aug 21 12:39:53 <REDACTED> s3backer[1721133]: s3backer: started /usr//sbin/nbd-client as process 1721146
Aug 21 12:39:53 <REDACTED> nbdkit[1721112]: s3backer: unmount <REDACTED>: initiated
Aug 21 12:39:53 <REDACTED> nbdkit[1721112]: s3backer: unmount <REDACTED>: shutting down filesystem
Aug 21 12:39:53 <REDACTED> nbdkit[1721112]: s3backer: unmount <REDACTED>: clearing mount token
Aug 21 12:39:53 <REDACTED> nbdkit[1721112]: s3backer: DELETE https://s3.ap-southeast-1.wasabisys.com/<REDACTED>/vdev2s3backer-mounted
Aug 21 12:39:53 <REDACTED> s3backer[1721133]: s3backer: child process /usr//sbin/nbd-client (1721146) terminated normally
Aug 21 12:39:53 <REDACTED> kernel: block nbd0: NBD_DISCONNECT
Aug 21 12:39:53 <REDACTED> kernel: block nbd0: Send disconnect failed -32
Aug 21 12:39:53 <REDACTED> nbdkit[1721112]: s3backer: success: DELETE https://s3.ap-southeast-1.wasabisys.com/<REDACTED>/vdev2s3backer-mounted
Aug 21 12:39:53 <REDACTED> nbdkit[1721112]: s3backer: unmount <REDACTED>: completed

Note that the mount token is deleted even though NBD disconnect failed.

And here's a new NBD dev nbd1, if I create it, then kill the underlying nbdkit process at 12:44:21, then s3backer at 12:45:23 - there was no usage of the device so I/O to fail, though.

ug 21 12:43:55 <REDACTED> nbdkit[1723496]: s3backer: mounting <REDACTED>
Aug 21 12:43:55 <REDACTED> s3backer[1723517]: s3backer: successfully daemonized as process 1723517
Aug 21 12:43:55 <REDACTED> s3backer[1723517]: s3backer: executing /usr//sbin/nbd-client with these parameters:
Aug 21 12:43:55 <REDACTED> s3backer[1723517]: s3backer:   [00] "/usr//sbin/nbd-client"
Aug 21 12:43:55 <REDACTED> s3backer[1723517]: s3backer:   [01] "-unix"
Aug 21 12:43:55 <REDACTED> s3backer[1723517]: s3backer:   [02] "/run/s3backer-nbd/0000000000000005_00000000000003e9"
Aug 21 12:43:55 <REDACTED> s3backer[1723517]: s3backer:   [03] "-block-size"
Aug 21 12:43:55 <REDACTED> s3backer[1723517]: s3backer:   [04] "4096"
Aug 21 12:43:55 <REDACTED> s3backer[1723517]: s3backer:   [05] "-nofork"
Aug 21 12:43:55 <REDACTED> s3backer[1723517]: s3backer:   [06] "/dev/nbd1"
Aug 21 12:43:55 <REDACTED> s3backer[1723517]: s3backer: started /usr//sbin/nbd-client as process 1723518
Aug 21 12:43:55 <REDACTED> s3backer[1723517]: s3backer: child process /usr//sbin/nbd-client (1723518) terminated normally
Aug 21 12:43:55 <REDACTED> kernel: nbd1: detected capacity change from 0 to 1048576000
Aug 21 12:43:55 <REDACTED> nbdkit[1723496]: s3backer: GET https://s3.ap-southeast-1.wasabisys.com/<REDACTED>/vdev200000000
Aug 21 12:43:55 <REDACTED> nbdkit[1723496]: s3backer: rec'd 404 response: GET https://s3.ap-southeast-1.wasabisys.com/<REDACTED>/vdev200000000
Aug 21 12:43:55 <REDACTED> nbdkit[1723496]: s3backer: GET https://s3.ap-southeast-1.wasabisys.com/<REDACTED>/vdev2001f3fff
Aug 21 12:43:55 <REDACTED> nbdkit[1723496]: s3backer: rec'd 404 response: GET https://s3.ap-southeast-1.wasabisys.com/<REDACTED>/vdev2001f3fff
Aug 21 12:43:55 <REDACTED> nbdkit[1723496]: s3backer: GET https://s3.ap-southeast-1.wasabisys.com/<REDACTED>/vdev2001f3ffe
Aug 21 12:43:55 <REDACTED> nbdkit[1723496]: s3backer: rec'd 404 response: GET https://s3.ap-southeast-1.wasabisys.com/<REDACTED>/vdev2001f3ffe
Aug 21 12:43:55 <REDACTED> nbdkit[1723496]: s3backer: GET https://s3.ap-southeast-1.wasabisys.com/<REDACTED>/vdev2001f3ff9
Aug 21 12:43:55 <REDACTED> nbdkit[1723496]: s3backer: rec'd 404 response: GET https://s3.ap-southeast-1.wasabisys.com/<REDACTED>/vdev2001f3ff9
Aug 21 12:43:55 <REDACTED> nbdkit[1723496]: s3backer: GET https://s3.ap-southeast-1.wasabisys.com/<REDACTED>/vdev200000001
Aug 21 12:43:55 <REDACTED> nbdkit[1723496]: s3backer: rec'd 404 response: GET https://s3.ap-southeast-1.wasabisys.com/<REDACTED>/vdev200000001
Aug 21 12:43:55 <REDACTED> nbdkit[1723496]: s3backer: GET https://s3.ap-southeast-1.wasabisys.com/<REDACTED>/vdev200000002
Aug 21 12:43:55 <REDACTED> nbdkit[1723496]: s3backer: rec'd 404 response: GET https://s3.ap-southeast-1.wasabisys.com/<REDACTED>/vdev200000002
Aug 21 12:43:55 <REDACTED> nbdkit[1723496]: s3backer: GET https://s3.ap-southeast-1.wasabisys.com/<REDACTED>/vdev200000004
Aug 21 12:43:55 <REDACTED> nbdkit[1723496]: s3backer: rec'd 404 response: GET https://s3.ap-southeast-1.wasabisys.com/<REDACTED>/vdev200000004
Aug 21 12:43:55 <REDACTED> nbdkit[1723496]: s3backer: GET https://s3.ap-southeast-1.wasabisys.com/<REDACTED>/vdev200000008
Aug 21 12:43:55 <REDACTED> nbdkit[1723496]: s3backer: rec'd 404 response: GET https://s3.ap-southeast-1.wasabisys.com/<REDACTED>/vdev200000008
Aug 21 12:43:55 <REDACTED> nbdkit[1723496]: s3backer: GET https://s3.ap-southeast-1.wasabisys.com/<REDACTED>/vdev200000010
Aug 21 12:43:55 <REDACTED> nbdkit[1723496]: s3backer: rec'd 404 response: GET https://s3.ap-southeast-1.wasabisys.com/<REDACTED>/vdev200000010
Aug 21 12:44:21 <REDACTED> kernel: block nbd1: Receive control failed (result -32)
Aug 21 12:45:23 <REDACTED> s3backer[1723517]: s3backer: got signal 15
Aug 21 12:45:23 <REDACTED> s3backer[1723517]: s3backer: rec'd signal during sleep
Aug 21 12:45:23 <REDACTED> s3backer[1723517]: s3backer: shutting down s3backer NDB server
Aug 21 12:45:23 <REDACTED> s3backer[1723517]: s3backer: executing /usr//sbin/nbd-client with these parameters:
Aug 21 12:45:23 <REDACTED> s3backer[1723517]: s3backer:   [00] "/usr//sbin/nbd-client"
Aug 21 12:45:23 <REDACTED> s3backer[1723517]: s3backer:   [01] "-d"
Aug 21 12:45:23 <REDACTED> s3backer[1723517]: s3backer:   [02] "/dev/nbd1"
Aug 21 12:45:23 <REDACTED> s3backer[1723517]: s3backer: started /usr//sbin/nbd-client as process 1724389
Aug 21 12:45:23 <REDACTED> s3backer[1723517]: s3backer: child process /usr//sbin/nbd-client (1724389) terminated normally
Aug 21 12:45:23 <REDACTED> kernel: block nbd1: NBD_DISCONNECT
Aug 21 12:45:23 <REDACTED> kernel: block nbd1: Send disconnect failed -32
Aug 21 12:45:23 <REDACTED> kernel: block nbd1: shutting down sockets

In this scenario, the mount token is not deleted, even though NBD disconnect failed, and so --force must be used on subsequent mount.

archiecobbs commented 2 months ago

With s3backer using NBD there are several independent processes originating from independent projects running cooperatively. If you randomly kill one of them then of course it's not necessarily possible for the others to automatically compensate.

The thing where a process gets stuck unkillable in disk I/O is as old as BSD Unix and is not something s3backer has any control over.

I realize you are just experimenting, which is fine, but I'm unclear on whether there is a specific question for s3backer.

Right now it's pretty much doing what it can, which is basically throw its hands up... if the NBD processes don't function as they are supposed to, there's not a lot that s3backer can do because there's no way to infer what the problem is, much less do anything to fix it.

stevekerrison commented 2 months ago

Hi @archiecobbs,

The thing where a process gets stuck unkillable in disk I/O is as old as BSD Unix and is not something s3backer has any control over.

Yeah that's fair enough and I've encountered it many places before.

Just to complete what I said I'd test in my original post: I tested a pool with failmode=continue set today, and because on a single disk there are no devs that can complete a read, that also gets stuck in the same way, with processes hanging. At least that's how it looks to me.

Back to your point, though:

I'm unclear on whether there is a specific question for s3backer.

I just looked into whether s3backer could do anything about its response to being killed if an upper system was still using the NBD. In some cases, /sys/block/$DEV/holders/ might inform you, but for ZFS it seems this is not the case - /sys/block/nbd0/holders/ is empty for me even when I have a zpool running on it. And if none of the ZFS volumes are mounted, there'll be nothing in mounts to tell you either, even though the zpool is still using the vdevs.

Implementing a check for every possible use of the NBD wouldn't make sense as one cannot anticipate all of them. So yeah, not s3backer's problem, so to speak.

The only remaining issue I think worth looking at is this, from my earlier comment:

Aug 21 12:39:53 <REDACTED> nbdkit[1721112]: s3backer: mounting <REDACTED>
Aug 21 12:39:53 <REDACTED> s3backer[1721133]: s3backer: successfully daemonized as process 1721133
Aug 21 12:39:53 <REDACTED> s3backer[1721133]: s3backer: executing /usr//sbin/nbd-client with these parameters:
Aug 21 12:39:53 <REDACTED> s3backer[1721133]: s3backer:   [00] "/usr//sbin/nbd-client"
Aug 21 12:39:53 <REDACTED> s3backer[1721133]: s3backer:   [01] "-unix"
Aug 21 12:39:53 <REDACTED> s3backer[1721133]: s3backer:   [02] "/run/s3backer-nbd/0000000000000005_00000000000003e8"
Aug 21 12:39:53 <REDACTED> s3backer[1721133]: s3backer:   [03] "-block-size"
Aug 21 12:39:53 <REDACTED> s3backer[1721133]: s3backer:   [04] "4096"
Aug 21 12:39:53 <REDACTED> s3backer[1721133]: s3backer:   [05] "-nofork"
Aug 21 12:39:53 <REDACTED> s3backer[1721133]: s3backer:   [06] "/dev/nbd0"
Aug 21 12:39:53 <REDACTED> s3backer[1721133]: s3backer: started /usr//sbin/nbd-client as process 1721134
Aug 21 12:39:53 <REDACTED> nbd_client[1721134]: Failed to setup device, check dmesg
Aug 21 12:39:53 <REDACTED> nbd_client[1721134]: Exiting.

With this things are already broken, and an NBD device is stuck. Not s3backer's problem. However, s3backer fails to identify the failure of nbd-client. Would it be possible to check that, during startup, the device is at least successfully setup, and if not, emit an error. Right now, it's just handling it as a tear-down due to the underlying client exiting, which makes sense in the general case, but in the context of initial setup, there might be something a bit more helpful to output there. I suspect something might be possible around https://github.com/archiecobbs/s3backer/blob/1de6473e0305fb2321261ab706022c962d438807/main.c#L403, because although util.c logs the non-zero exit code, main.c doesn't include any interpretation of that, simply bailing out. Indeed, ndb-client is expected to exit immediately, but isn't it expected to exit with status 0? I could prepare a PR if that is something you're interested in.

Oh and the other thing might be documentation - it might be useful, wherever killing s3backer is mentioned, to warn about the inconsistent state any upper system using the block device may end up in. Then at least it's easy to point to the docs if people ask about this kind of thing.

stevekerrison commented 2 months ago

I had a poke around myself and came up with https://github.com/archiecobbs/s3backer/compare/master...stevekerrison:s3backer:nbd-client-return-check

I've not put a PR in for it yet as I'm not sure what your view is on the need for it. For example, a warning is already logged if any child terminates unusually: https://github.com/archiecobbs/s3backer/blob/master/util.c#L687 But it's maybe not obvious that it's a fatal condition or that it occurred during setup, rather than later due to some other event.

I'm also not satisfied with my changes because they only go to the logs when out of foreground mode, and I think it would make sense to notify of a client error in the foreground before forking, although that would require a few more changes to achieve.

Secondly, I noticed an interesting condition while testing my changes. Rather than lock up a zpool, I simply try to re-use an NBD node. With the following results:

Happy to hear your views on this.

archiecobbs commented 2 months ago

I agree the nbd-client(8) cleanup step should be skipped if the process failed to start properly.

See if https://github.com/archiecobbs/s3backer/compare/master...issue227 addresses this sufficiently for you.

stevekerrison commented 2 months ago

Yup that does it. Thanks!

Something to consider, which I can rebase later once your above fix is merged, is deferring daemonizing until after nbd-client has completed. However, with the change I've done I think there aren't any child processes being watched any more, so wait_for_child_to_exit is sleeping and waiting for a signal, which makes me think the loop is a bit redundant.

https://github.com/archiecobbs/s3backer/compare/issue227...stevekerrison:s3backer:issue227-daemonize-after-nbd-client-exit

At present this means nbd-client's output adds noise to the terminal, but s3backer can fail at the terminal before forking.

$ s3backer --nbd --configFile=A ... /dev/nbd1
nbdkit: auto-detecting block size and total file size...
nbdkit: auto-detected block size=256k and total size=500g
2024-08-23 11:16:02 INFO: established new mount token 0x3553825a
s3backer: connecting ... to /dev/nbd1
Warning: the oldstyle protocol is no longer supported.
This method now uses the newstyle protocol with a default export
Negotiation: ..size = 512000MB
Connected /dev/nbd1
s3backer: daemonizing
$ echo $?
0
$ s3backer --nbd --configFile=B ... /dev/nbd1
nbdkit: auto-detecting block size and total file size...
nbdkit: auto-detection failed; using configured block size 256k and file size 500g
2024-08-23 11:16:12 INFO: established new mount token 0x56e06136
s3backer: connecting ... to /dev/nbd1
Warning: the oldstyle protocol is no longer supported.
This method now uses the newstyle protocol with a default export
Negotiation: ..size = 512000MB
Error: Failed to setup device, check dmesg

Exiting.
s3backer: child process /usr/sbin/nbd-client (947165) terminated with exit value 1
s3backer: client failed to connect to /dev/nbd1
s3backer: shutting down s3backer NDB server
$ echo $?
1

This has gone a bit OT from my original issue, so to reflect back on that again, I agree with you there's not much to be done if somebody starts killing processes in the wrong order, especially when ZFS just doesn't handle failures in non-redundant pools very well. At least with your changes, now it will be more obvious through s3backer when an NBD node is unusable.

I also found the only ways to see what devices are being used as vdevs is through zpool status, zdb, or presumably some equivalent through libzfs, and none of those are generic. The sysfs holders folder isn't consistently used by other components either, nor should it be. Device mapper seems like it can help, but that's yet another abstraction and dependency to deal with.

archiecobbs commented 2 months ago

Something to consider, which I can rebase later once your above fix is merged, is deferring daemonizing until after nbd-client has completed.

It seems like the only advantage in doing this would be that certain (probably rare) errors that occur at certain points in the setup process would slightly be easier to find/see - i.e., printed on the console instead of syslog or whatever. So it's a nice idea, but since whomever is going to have to do a bunch of debugging in this situation anyway, this doesn't seem to warrant making more changes to what is already a somewhat brittle process.

stevekerrison commented 2 months ago

Something to consider, which I can rebase later once your above fix is merged, is deferring daemonizing until after nbd-client has completed.

It seems like the only advantage in doing this would be that certain (probably rare) errors that occur at certain points in the setup process would slightly be easier to find/see - i.e., printed on the console instead of syslog or whatever. So it's a nice idea, but since whomever is going to have to do a bunch of debugging in this situation anyway, this doesn't seem to warrant making more changes to what is already a somewhat brittle process.

Yeah I'd had that thought as well, and anybody using the tool should be competent enough to deal with it. I was likening it to other daemons, such as web servers, that will check config and the resources it references, before forking or reloading. But, then again, they won't care if you delete a TLS key after loading as they'll still have it mapped into memory, and will suffer just as much if any underlying filesystem disappears.

Still, thanks for the tweak to the cleanup process!

archiecobbs commented 2 months ago

OK sounds good. Thanks for the report!