failure to reconnect with -persist

mxork commented 5 years ago

I still need to do a proper job getting logs together, but maybe someone can tell me I'm being dumb before I put too much time into this.

Currently, filesystems I have mounted from nbd devices panic on I/O failure if I restart the corresponding server. I would like nbd to renegotiate the connection if the server drops, and -persist seems to do the right thing . However, if I set up a test environment on the local machine and restart the server after connecting:

$ nbd-client -N default localhost /dev/nbd0 -persist -nonetlink -nofork             
Negotiation: ..size = 1024MB
bs=1024, sz=1073741824 bytes
timeout=5
<restart nbd-server here>
Kernel call returned.
sock, done

The call simply returns, and does not attempt to reconnect. From the log message, it does not take the branch at https://github.com/NetworkBlockDevice/nbd/blob/128fd556286ff5d53c5f2b16c4ae5746b5268a64/nbd-client.c#L1292, instead seeming to take the branch at https://github.com/NetworkBlockDevice/nbd/blob/128fd556286ff5d53c5f2b16c4ae5746b5268a64/nbd-client.c#L1329. I have no idea why the ioctl call would return >= 0, but it seems to.

I realize that the filesystem may also need some love to get the desired behavior, but that's moot if nbd does not renegotiate.

nand11 commented 4 years ago

Hi,

TL;DR: Here is my experience with the -persist option of nbd-client, can you help me?

I want to use NBD over an unreliable network connection. So I was trying to use nbd-client with the -persist option. A while ago somone on irc (#nbd on oftc.net) told me that -persist is broken on recent kernels and I should try to use -nonetlink (Thanks!). So I did that (using debian buster). I also started nbd-client with -nofork to get debug outputs.

Test 1:

I connected nbd, interrupted the network connection and nbd-client exited with

Kernel call returned. sock, done

Okay, that's not what I wanted so I had a look at the sourcecode of nbd-client (debian source package nbd-3.19). Near the end of main(), there is this section that gets executed:

  if (ioctl(nbd, NBD_DO_IT) < 0) {
           [...]
   } else {
           /* We're on 2.4. It's not clearly defined what exactly
            * happened at this point. Probably best to quit, now
            */
           fprintf(stderr, "Kernel call returned.\n");
           cont=0;
   }

Okay, we are not on Kernel 2.4, my kernel version is 4.19.

Test 2:

I found out that ioctl returned 0 in case of a network disconnect. So I changed

  if (ioctl(nbd, NBD_DO_IT) < 0) {

to

  if (ioctl(nbd, NBD_DO_IT) <= 0) {

and things got better. After a network disconnect, nbd-client tried to reconnect:

nbd,16756: Kernel call returned: Success Reconnecting Error: Socket failed: Connection refused Reconnecting Error: Socket failed: Connection refused Reconnecting Error: Socket failed: Connection refused Reconnecting

A connection retry was done about once per second. Yay! I restored my network connection and it worked:

Negotiation: ..size = 2844688MB bs=512, sz=2982871564288 bytes timeout=120 nbd,31830: Kernel call returned: Success Reconnecting Error: Socket failed: Connection refused Reconnecting Error: Socket failed: Connection refused Reconnecting Negotiation: ..size = 2844688MB bs=512, sz=2982871564288 bytes timeout=120

Test 3:

The test above was done without a mounted filesystem. Now I mounted a filesystem (readonly, via LUKS crypto) and tried it again. Unfortunately, the result was a little bit different. The kernel said it was busy and retries were attempted quickly after another. nbd-client printed:

Negotiation: ..size = 2844688MB bs=512, sz=2982871564288 bytes timeout=120 nbd,16756: Kernel call returned: Device or resource busy Reconnecting Negotiation: ..size = 2844688MB bs=512, sz=2982871564288 bytes timeout=120 nbd,16756: Kernel call returned: Device or resource busy Reconnecting

The reconnect in nbd-client does not print an error, but the next iteration of

ioctl(nbd, NBD_DO_IT)

returns "Device or resource busy".

Now I am at a point where I would need to debug or understand the kernel nbd driver, what I have not attempted yet. (__nbd_ioctl calls nbd_start_device_ioctl for NBD_DO_IT, this calls nbd_start_device, and this returns -EBUSY if nbd->task_recv is set. https://github.com/torvalds/linux/blob/e595dd94515ed6bc5ba38fce0f9598db8c0ee9a9/drivers/block/nbd.c#L1232 Is this where it fails?)

best regards, nand11

axos88 commented 3 years ago

This is still an issue with version 3.18

0rtz commented 3 years ago

I see the same behaviour as nand11 with version 3.21

fff7d1bc commented 3 years ago

I can confirm that there's no attempt to reconnect on 3.21.

yoe commented 3 years ago

So.

-persist uses an old quirk in the ioctl configuration interface that used to work at some point, but seems to have been lost after a few maintainer changes for the kernel module of nbd.

The netlink interface does not have anything to support -persist; when using the netlink interface, nbd-client has exited long before the connection is dropped, and so there is no way for it to discover that this has happened.

In order for -persist to work again, I think we need to go back to the drawing board. Meanwhile a possible workaround could be to use the multiple connection feature; if the connection drops, and you still have another connection open, then that allows you to continue working (but obviously that doesn't work if you only have one server and that one is being restarted).

For now, I think I'll just disable -persist (better to not have a feature rather than one that does nothing), and talk to the kernel maintainer to see how we can fix this.

fff7d1bc commented 3 years ago

In that case, would you also consider enabling timeout by default? From what I see, even if I don't use -persist, like

nbd-client  -nofork  -N test1 HOSTNAME /dev/nbd0

And the server reboots, I get a block device and nbd-client forever in D state, until I reboot. Seems like timeout is essential here.

yoe commented 3 years ago

It's actually in the D state until the TCP timeout, which is a per-system setting that defaults to 2 hours and 12 minutes (IIRC).

The timeout thing is another of those bad ideas that I should probably get rid of; it triggers if the device is perfectly happily connected but idle. This may be a good idea in some cases, but not in most.

chabad360 commented 3 years ago

Well that would explain why my kiosks randomly fail (well, clearly not as random as I thought) if they sit idle for too long. How would I go about disabling the timeout?

yoe commented 3 years ago

If you don't explicitly pass the -t or -timeout parameter to nbd-client, it shouldn't be set. If you still see things going wrong there, please file a (separate) bug.

chabad360 commented 3 years ago

Wait, now I'm confused. The timeout is set by default, or by default it never times out?

Cause at this point, this has become quite a problem for me, but I've never been able (or had enough time and patience) to really track it down. I'm not sure if my kiosks are failing from this timeout (seems to happen only during idle) or if it's a lucky network failure.

I'm thinking of switching to ISCSI, but the CoW feature of NBD is very useful to me.

(Does the timeout in nbd-server also close on idle?)

yoe commented 3 years ago

Neither the client nor the server timeout should be set by default (which means neither should time out by default).

The TCP keepalive probes are set, and it's not possible to switch them off. As long as the remote end is still functioning properly, these shouldn't interrupt your connection, however.

bauen1 commented 2 years ago

Hi, is there any chance this is getting fixed in the near future, or a way to work around this issue ? Otherwise I'll also have to investigate replacing nbd with iSCSI or something else.

I have setup backups on a remote server for my laptop, for this a wireguard VPN is setup between the hosts, and the server runs an nbd-server. When connecting with -persist, eventually after a day or so, attempting to access the mounted filesystem will result in IO errors due to nbd having dropped the connection, forcing me to unmount everything uncleanly, reconnect and remount everything again.

yoe commented 2 years ago

A simple workaround is to make sure the connection never remains idle for too long. Just touching a file in the mounted NBD file system every once in a while should do that.

wtarreau commented 1 year ago

That's a very sad situation, I was testing NBD as a really appealing candidate for remote backups, but ended up on this non-working persist situation, and the D state as well when timeout is not set. Yes I think we should rework all of this a few ways:

always set TCP_USER_TIMEOUT on linux >= 2.6.37. This one is very clean, as it only counts failures to ACK sent packets ;
also enable SO_KEEPALIVE with a short TCP_KEEPIDLE value (e.g. configured timeout divided by 3 or so retries).

But this would only be used to make sure the timeout doesn't kill idle connections and actually only kills dead ones (killing idle connections didn't happen in my tests). The fact that the daemon cannot automatically reconnect by default with -p is a problem that clearly indicates a logic error in the code, but if it fails on EBUSY once the block device is in use, we have a much bigger problem which is that NBD is basically unusable for any real-world purpose since TCP connections eventually fail. I can easily reproduce this here by trying to restart nbd-client after a network error:

19:02:11.161069 ioctl(4, NBD_SET_SOCK, 3) = 0
19:02:11.176188 rt_sigprocmask(SIG_SETMASK, ~[KILL PIPE TERM RTMIN RT_1], ~[KILL PIPE TERM STOP RTMIN RT_1], 8) = 0
19:02:11.176416 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLDstrace: Process 7755 attached
, child_tidptr=0xffff8e2290f0) = 7755
[pid  7447] 19:02:11.177218 ioctl(4, NBD_DO_IT <unfinished ...>
[pid  7755] 19:02:11.177284 set_robust_list(0xffff8e229100, 24 <unfinished ...>
[pid  7447] 19:02:11.177338 <... ioctl resumed>) = -1 EBUSY (Device or resource busy)
[pid  7755] 19:02:11.177398 <... set_robust_list resumed>) = 0
[pid  7447] 19:02:11.177481 write(2, "nbd,7447: Kernel call returned: Device or resource busy\n", 56nbd,7447: Kernel call returned: Device or resource busy
) = 56
[pid  7447] 19:02:11.177651 close(3 <unfinished ...>
[pid  7755] 19:02:11.177698 openat(AT_FDCWD, "/sys/block/nbd0/pid", O_RDONLY <unfinished ...>
[pid  7447] 19:02:11.177783 <... close resumed>) = 0
[pid  7447] 19:02:11.177869 close(4 <unfinished ...>

Worse, it loops like crazy eating the CPU trying to do that in loops.

I agree that it may become important to get back to the blackboard. It seems to me we're dealing with a bunch of chicken-and-egg problems here. Maybe we're just missing a "reconnect" operation to communicate with the kernel instead of the "connect" one, I don't know.

nand11 commented 1 year ago

Oh, this thread is still active? Then it may make sense to share my experience.

I tried iscsi over an unreliable network connection and it worked. It is slow over this kind of network, as expected, but it can handle reconnects. I am using the standard debian packages targetcli-fb (server) and open-iscsi (client).

For iscsi, the client demon iscsid does connection-level error processing. But I have not looked at the source code to see how things are handled differently between nbd and iscsi.

wtarreau commented 1 year ago

Very interesting, thanks a lot for sharing your experience. That's definitely something I should have a look at!

wtarreau commented 1 year ago

Many thanks @nand11 for your insights. I've followed some howtos (there are different server implementations so it may look confusing at first but "tgtd" did work fine). It worked very well, and in addition it's particularly robust to connection outage. I've unplugged links as well as removed/restored/changed IP address on the interface. There's a 5s timeout after which the connection is declared dead and is destroyed, then a new one is attempted via the regular paths, so that should resist rebooting firewalls and triple-play boxes silently changing IPs. I'll go that way now, even if the configuration is less trivial, it looks way more robust. Thanks again!

fathyb commented 1 year ago

This issue was affecting me pretty badly so I built an alternative nbd-client: https://github.com/fathyb/node-nbd-client. It also resolves other issues I was having related to performance and Docker.

wtarreau commented 1 year ago

This issue was affecting me pretty badly so I built an alternative nbd-client: https://github.com/fathyb/node-nbd-client.

Interesting to see some work still being done around this. However the choice of nodejs makes it a showstopper for many of us using embedded devices (typically where the full OS+config fits in a 16 MB NOR flash). But it likely has use cases in other environments. Now that I've got iscsi working (using much more complex components and configs), I have not yet figured if nbd still has some benefits (beyond its significant simplicity).

felixonmars commented 7 months ago

Just in case anyone's interested here, I have migrated my setup to NVMe/TCP using nvmetcli for the server, and nvme-cli for the client. Reconnects work flawlessly if the problem doesn't present for too long (it stops retrying after 1h).

wtarreau commented 7 months ago

Thanks for the info. I personally migrated to iSCSI instead, which is amazingly complicated but rock solid and never failed me once in one year despite multiple short and long network outages. Why does nvme-cli stop retrying after one hour ? Is it a config setting or anything else ?

AndySchroder commented 4 months ago

What are the practical use cases for nbd if it can't handle a simple server restart? Seems like a lot of coordination required to use nbd with this constraint.

xujihui1985 commented 2 months ago

you can use netlink interface to reconfig the device, establish a new sock with server and pass the socket fd to device with NBD_CMD_RECONFIGURE

but the thing is how to check if the sock that device hold is broken? I can't find a good way to do that maybe use a thread to periodically ping the server with the sock?

michael-newsrx commented 1 month ago

What is the fix for this? I am investigating using this for XFS on top of S3. But it keeps erroring out in the nbd-client part and everything stops working. Is there a way to run iSCSI with S3 backing storage?

yoe commented 1 month ago

On Mon, Aug 05, 2024 at 12:39:43PM -0700, Michael Conrad wrote:

What is the fix for this? I am investigating using this for XFS on top of S3. But it keeps erroring out in the nbd-client part and everything stops working. Is there a way to run iSCSI with S3 backing storage?

A kernel patch is (probably...) required.

When using the ioctl API, the -persist code will immediately try to reconnect if the NBD_DO_IT ioctl exits with an error state. Previously, the kernel was written such that the kernel would freeze all writes to the NBD device until the nbd-client process exited, but I believe this has been lost over a number of refactors (although I'm not entirely sure of this). When using the netlink API, there is no opportunity to do this as the nbd-client process immediately exits after setting up the connection and does not wait for errors. So there would need to be a monitor mode etc, which currently does not exist. So if this can still work at all, you'll need to specify the -nonetlink option to nbd-client.

I have recently started working on the nbd driver in the kernel to improve support of various things, and this is one of the things that I'm planning on working on, but it will take a while.

In the mean time, if the -nonetlink option does not work, another option could be to use multiple nbd connections to a single device (which only works with the netlink interface... I know, I know). If doing that, and a single connection to the server fails, then the second one will still exist and the connection will not drop. See the -connections (-C) option to nbd-client for details on that one.

-- @.***{be,co.za} wouter@{grep.be,fosdem.org,debian.org}

I will have a Tin-Actinium-Potassium mixture, thanks.

corbolais commented 1 month ago

@yoe yoe, thanks for your work on nbd.

I got excited as all the others in this thread have, for nbd's simplicity, mainly. It was very appealing for a scenario similar to that remote backup/LUKS thing someone else tried. I'm flatlining now as this persist option is still not working and reconnect is also still failing. So sad.

A simple test setup with a vanishing nbd-server was enough. After nbd-server restarts it results in nbd-client ebusy errors and a blocking cp process, with ugly kernel task hung msg and all. FWIW, the ZFS pool seemed like overkill in hindsight. "But it ought to be a realistic scenario!", someone may have heard me thinking..

[Thu Aug 22 01:25:52 2024] zio pool=nbdpool vdev=/dev/nbd0 error=5 type=1 offset=524034048 size=8192 flags=721089
[Thu Aug 22 01:25:52 2024] WARNING: Pool 'nbdpool' has encountered an uncorrectable I/O failure and has been suspended.

[Thu Aug 22 01:26:06 2024] nbd: nbd0 already in use
[Thu Aug 22 01:26:17 2024] block nbd0: NBD_DISCONNECT
[Thu Aug 22 01:26:17 2024] block nbd0: Send disconnect failed -32
[Thu Aug 22 01:26:19 2024] nbd: nbd0 already in use
[Thu Aug 22 01:27:24 2024] block nbd0: NBD_DISCONNECT
[Thu Aug 22 01:27:24 2024] block nbd0: Send disconnect failed -32
[Thu Aug 22 01:27:29 2024] INFO: task txg_sync:2020404 blocked for more than 122 seconds.
[Thu Aug 22 01:27:29 2024]       Tainted: P           OE      6.5.0-28-lowlatency #29.1-Ubuntu
[Thu Aug 22 01:27:29 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Aug 22 01:27:29 2024] task:txg_sync        state:D

So iSCSI it is. For now.

Edit: version under test (VUT): server 1:3.23-3ubuntu1.22.04.1, client 1:3.26.1-1ubuntu0.1

myyddngyer03932 commented 1 month ago

This is still an issue with version 3.24

myyddngyer03932 commented 3 weeks ago

multipath seems to be a good choice, I will try to use this tool to solve the problem

NetworkBlockDevice / nbd