Open mxork opened 5 years ago
Hi,
I want to use NBD over an unreliable network connection. So I was trying to use nbd-client with the -persist option. A while ago somone on irc (#nbd on oftc.net) told me that -persist is broken on recent kernels and I should try to use -nonetlink (Thanks!). So I did that (using debian buster). I also started nbd-client with -nofork to get debug outputs.
I connected nbd, interrupted the network connection and nbd-client exited with
Kernel call returned. sock, done
Okay, that's not what I wanted so I had a look at the sourcecode of nbd-client (debian source package nbd-3.19). Near the end of main(), there is this section that gets executed:
if (ioctl(nbd, NBD_DO_IT) < 0) { [...] } else { /* We're on 2.4. It's not clearly defined what exactly * happened at this point. Probably best to quit, now */ fprintf(stderr, "Kernel call returned.\n"); cont=0; }
Okay, we are not on Kernel 2.4, my kernel version is 4.19.
I found out that ioctl returned 0 in case of a network disconnect. So I changed
if (ioctl(nbd, NBD_DO_IT) < 0) {
to
if (ioctl(nbd, NBD_DO_IT) <= 0) {
and things got better. After a network disconnect, nbd-client tried to reconnect:
nbd,16756: Kernel call returned: Success Reconnecting Error: Socket failed: Connection refused Reconnecting Error: Socket failed: Connection refused Reconnecting Error: Socket failed: Connection refused Reconnecting
A connection retry was done about once per second. Yay! I restored my network connection and it worked:
Negotiation: ..size = 2844688MB bs=512, sz=2982871564288 bytes timeout=120 nbd,31830: Kernel call returned: Success Reconnecting Error: Socket failed: Connection refused Reconnecting Error: Socket failed: Connection refused Reconnecting Negotiation: ..size = 2844688MB bs=512, sz=2982871564288 bytes timeout=120
The test above was done without a mounted filesystem. Now I mounted a filesystem (readonly, via LUKS crypto) and tried it again. Unfortunately, the result was a little bit different. The kernel said it was busy and retries were attempted quickly after another. nbd-client printed:
Negotiation: ..size = 2844688MB bs=512, sz=2982871564288 bytes timeout=120 nbd,16756: Kernel call returned: Device or resource busy Reconnecting Negotiation: ..size = 2844688MB bs=512, sz=2982871564288 bytes timeout=120 nbd,16756: Kernel call returned: Device or resource busy Reconnecting
The reconnect in nbd-client does not print an error, but the next iteration of
ioctl(nbd, NBD_DO_IT)
returns "Device or resource busy".
Now I am at a point where I would need to debug or understand the kernel nbd driver, what I have not attempted yet. (__nbd_ioctl calls nbd_start_device_ioctl for NBD_DO_IT, this calls nbd_start_device, and this returns -EBUSY if nbd->task_recv is set. https://github.com/torvalds/linux/blob/e595dd94515ed6bc5ba38fce0f9598db8c0ee9a9/drivers/block/nbd.c#L1232 Is this where it fails?)
best regards, nand11
This is still an issue with version 3.18
I see the same behaviour as nand11 with version 3.21
I can confirm that there's no attempt to reconnect on 3.21.
So.
-persist uses an old quirk in the ioctl configuration interface that used to work at some point, but seems to have been lost after a few maintainer changes for the kernel module of nbd.
The netlink interface does not have anything to support -persist; when using the netlink interface, nbd-client has exited long before the connection is dropped, and so there is no way for it to discover that this has happened.
In order for -persist to work again, I think we need to go back to the drawing board. Meanwhile a possible workaround could be to use the multiple connection feature; if the connection drops, and you still have another connection open, then that allows you to continue working (but obviously that doesn't work if you only have one server and that one is being restarted).
For now, I think I'll just disable -persist (better to not have a feature rather than one that does nothing), and talk to the kernel maintainer to see how we can fix this.
In that case, would you also consider enabling timeout by default? From what I see, even if I don't use -persist, like
nbd-client -nofork -N test1 HOSTNAME /dev/nbd0
And the server reboots, I get a block device and nbd-client forever in D state, until I reboot. Seems like timeout is essential here.
It's actually in the D state until the TCP timeout, which is a per-system setting that defaults to 2 hours and 12 minutes (IIRC).
The timeout thing is another of those bad ideas that I should probably get rid of; it triggers if the device is perfectly happily connected but idle. This may be a good idea in some cases, but not in most.
Well that would explain why my kiosks randomly fail (well, clearly not as random as I thought) if they sit idle for too long. How would I go about disabling the timeout?
If you don't explicitly pass the -t or -timeout parameter to nbd-client, it shouldn't be set. If you still see things going wrong there, please file a (separate) bug.
Wait, now I'm confused. The timeout is set by default, or by default it never times out?
Cause at this point, this has become quite a problem for me, but I've never been able (or had enough time and patience) to really track it down. I'm not sure if my kiosks are failing from this timeout (seems to happen only during idle) or if it's a lucky network failure.
I'm thinking of switching to ISCSI, but the CoW feature of NBD is very useful to me.
(Does the timeout in nbd-server also close on idle?)
Neither the client nor the server timeout should be set by default (which means neither should time out by default).
The TCP keepalive probes are set, and it's not possible to switch them off. As long as the remote end is still functioning properly, these shouldn't interrupt your connection, however.
Hi, is there any chance this is getting fixed in the near future, or a way to work around this issue ? Otherwise I'll also have to investigate replacing nbd with iSCSI or something else.
I have setup backups on a remote server for my laptop, for this a wireguard VPN is setup between the hosts, and the server runs an nbd-server. When connecting with -persist, eventually after a day or so, attempting to access the mounted filesystem will result in IO errors due to nbd having dropped the connection, forcing me to unmount everything uncleanly, reconnect and remount everything again.
A simple workaround is to make sure the connection never remains idle for too long. Just touching a file in the mounted NBD file system every once in a while should do that.
That's a very sad situation, I was testing NBD as a really appealing candidate for remote backups, but ended up on this non-working persist situation, and the D state as well when timeout is not set. Yes I think we should rework all of this a few ways:
TCP_USER_TIMEOUT
on linux >= 2.6.37. This one is very clean, as it only counts failures to ACK sent packets ;SO_KEEPALIVE
with a short TCP_KEEPIDLE value (e.g. configured timeout divided by 3 or so retries).But this would only be used to make sure the timeout doesn't kill idle connections and actually only kills dead ones (killing idle connections didn't happen in my tests). The fact that the daemon cannot automatically reconnect by default with -p
is a problem that clearly indicates a logic error in the code, but if it fails on EBUSY once the block device is in use, we have a much bigger problem which is that NBD is basically unusable for any real-world purpose since TCP connections eventually fail. I can easily reproduce this here by trying to restart nbd-client after a network error:
19:02:11.161069 ioctl(4, NBD_SET_SOCK, 3) = 0
19:02:11.176188 rt_sigprocmask(SIG_SETMASK, ~[KILL PIPE TERM RTMIN RT_1], ~[KILL PIPE TERM STOP RTMIN RT_1], 8) = 0
19:02:11.176416 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLDstrace: Process 7755 attached
, child_tidptr=0xffff8e2290f0) = 7755
[pid 7447] 19:02:11.177218 ioctl(4, NBD_DO_IT <unfinished ...>
[pid 7755] 19:02:11.177284 set_robust_list(0xffff8e229100, 24 <unfinished ...>
[pid 7447] 19:02:11.177338 <... ioctl resumed>) = -1 EBUSY (Device or resource busy)
[pid 7755] 19:02:11.177398 <... set_robust_list resumed>) = 0
[pid 7447] 19:02:11.177481 write(2, "nbd,7447: Kernel call returned: Device or resource busy\n", 56nbd,7447: Kernel call returned: Device or resource busy
) = 56
[pid 7447] 19:02:11.177651 close(3 <unfinished ...>
[pid 7755] 19:02:11.177698 openat(AT_FDCWD, "/sys/block/nbd0/pid", O_RDONLY <unfinished ...>
[pid 7447] 19:02:11.177783 <... close resumed>) = 0
[pid 7447] 19:02:11.177869 close(4 <unfinished ...>
Worse, it loops like crazy eating the CPU trying to do that in loops.
I agree that it may become important to get back to the blackboard. It seems to me we're dealing with a bunch of chicken-and-egg problems here. Maybe we're just missing a "reconnect" operation to communicate with the kernel instead of the "connect" one, I don't know.
Oh, this thread is still active? Then it may make sense to share my experience.
I tried iscsi over an unreliable network connection and it worked. It is slow over this kind of network, as expected, but it can handle reconnects. I am using the standard debian packages targetcli-fb (server) and open-iscsi (client).
For iscsi, the client demon iscsid does connection-level error processing. But I have not looked at the source code to see how things are handled differently between nbd and iscsi.
Very interesting, thanks a lot for sharing your experience. That's definitely something I should have a look at!
Many thanks @nand11 for your insights. I've followed some howtos (there are different server implementations so it may look confusing at first but "tgtd" did work fine). It worked very well, and in addition it's particularly robust to connection outage. I've unplugged links as well as removed/restored/changed IP address on the interface. There's a 5s timeout after which the connection is declared dead and is destroyed, then a new one is attempted via the regular paths, so that should resist rebooting firewalls and triple-play boxes silently changing IPs. I'll go that way now, even if the configuration is less trivial, it looks way more robust. Thanks again!
This issue was affecting me pretty badly so I built an alternative nbd-client
: https://github.com/fathyb/node-nbd-client. It also resolves other issues I was having related to performance and Docker.
This issue was affecting me pretty badly so I built an alternative
nbd-client
: https://github.com/fathyb/node-nbd-client.
Interesting to see some work still being done around this. However the choice of nodejs makes it a showstopper for many of us using embedded devices (typically where the full OS+config fits in a 16 MB NOR flash). But it likely has use cases in other environments. Now that I've got iscsi working (using much more complex components and configs), I have not yet figured if nbd still has some benefits (beyond its significant simplicity).
Just in case anyone's interested here, I have migrated my setup to NVMe/TCP using nvmetcli for the server, and nvme-cli for the client. Reconnects work flawlessly if the problem doesn't present for too long (it stops retrying after 1h).
Thanks for the info. I personally migrated to iSCSI instead, which is amazingly complicated but rock solid and never failed me once in one year despite multiple short and long network outages. Why does nvme-cli stop retrying after one hour ? Is it a config setting or anything else ?
What are the practical use cases for nbd if it can't handle a simple server restart? Seems like a lot of coordination required to use nbd with this constraint.
you can use netlink interface to reconfig the device, establish a new sock with server and pass the socket fd to device with NBD_CMD_RECONFIGURE
but the thing is how to check if the sock that device hold is broken? I can't find a good way to do that maybe use a thread to periodically ping the server with the sock?
What is the fix for this? I am investigating using this for XFS on top of S3. But it keeps erroring out in the nbd-client part and everything stops working. Is there a way to run iSCSI with S3 backing storage?
On Mon, Aug 05, 2024 at 12:39:43PM -0700, Michael Conrad wrote:
What is the fix for this? I am investigating using this for XFS on top of S3. But it keeps erroring out in the nbd-client part and everything stops working. Is there a way to run iSCSI with S3 backing storage?
A kernel patch is (probably...) required.
When using the ioctl API, the -persist code will immediately try to reconnect if the NBD_DO_IT ioctl exits with an error state. Previously, the kernel was written such that the kernel would freeze all writes to the NBD device until the nbd-client process exited, but I believe this has been lost over a number of refactors (although I'm not entirely sure of this). When using the netlink API, there is no opportunity to do this as the nbd-client process immediately exits after setting up the connection and does not wait for errors. So there would need to be a monitor mode etc, which currently does not exist. So if this can still work at all, you'll need to specify the -nonetlink option to nbd-client.
I have recently started working on the nbd driver in the kernel to improve support of various things, and this is one of the things that I'm planning on working on, but it will take a while.
In the mean time, if the -nonetlink option does not work, another option could be to use multiple nbd connections to a single device (which only works with the netlink interface... I know, I know). If doing that, and a single connection to the server fails, then the second one will still exist and the connection will not drop. See the -connections (-C) option to nbd-client for details on that one.
-- @.***{be,co.za} wouter@{grep.be,fosdem.org,debian.org}
I will have a Tin-Actinium-Potassium mixture, thanks.
@yoe yoe, thanks for your work on nbd.
I got excited as all the others in this thread have, for nbd's simplicity, mainly. It was very appealing for a scenario similar to that remote backup/LUKS thing someone else tried. I'm flatlining now as this persist option is still not working and reconnect is also still failing. So sad.
A simple test setup with a vanishing nbd-server was enough. After nbd-server restarts it results in nbd-client ebusy errors and a blocking cp
process, with ugly kernel task hung msg and all. FWIW, the ZFS pool seemed like overkill in hindsight. "But it ought to be a realistic scenario!", someone may have heard me thinking..
[Thu Aug 22 01:25:52 2024] zio pool=nbdpool vdev=/dev/nbd0 error=5 type=1 offset=524034048 size=8192 flags=721089
[Thu Aug 22 01:25:52 2024] WARNING: Pool 'nbdpool' has encountered an uncorrectable I/O failure and has been suspended.
[Thu Aug 22 01:26:06 2024] nbd: nbd0 already in use
[Thu Aug 22 01:26:17 2024] block nbd0: NBD_DISCONNECT
[Thu Aug 22 01:26:17 2024] block nbd0: Send disconnect failed -32
[Thu Aug 22 01:26:19 2024] nbd: nbd0 already in use
[Thu Aug 22 01:27:24 2024] block nbd0: NBD_DISCONNECT
[Thu Aug 22 01:27:24 2024] block nbd0: Send disconnect failed -32
[Thu Aug 22 01:27:29 2024] INFO: task txg_sync:2020404 blocked for more than 122 seconds.
[Thu Aug 22 01:27:29 2024] Tainted: P OE 6.5.0-28-lowlatency #29.1-Ubuntu
[Thu Aug 22 01:27:29 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Aug 22 01:27:29 2024] task:txg_sync state:D
So iSCSI it is. For now.
Edit: version under test (VUT): server 1:3.23-3ubuntu1.22.04.1, client 1:3.26.1-1ubuntu0.1
This is still an issue with version 3.24
multipath seems to be a good choice, I will try to use this tool to solve the problem
I still need to do a proper job getting logs together, but maybe someone can tell me I'm being dumb before I put too much time into this.
Currently, filesystems I have mounted from nbd devices panic on I/O failure if I restart the corresponding server. I would like nbd to renegotiate the connection if the server drops, and
-persist
seems to do the right thing . However, if I set up a test environment on the local machine and restart the server after connecting:The call simply returns, and does not attempt to reconnect. From the log message, it does not take the branch at https://github.com/NetworkBlockDevice/nbd/blob/128fd556286ff5d53c5f2b16c4ae5746b5268a64/nbd-client.c#L1292, instead seeming to take the branch at https://github.com/NetworkBlockDevice/nbd/blob/128fd556286ff5d53c5f2b16c4ae5746b5268a64/nbd-client.c#L1329. I have no idea why the
ioctl
call would return >= 0, but it seems to.I realize that the filesystem may also need some love to get the desired behavior, but that's moot if nbd does not renegotiate.