Closed eworm-de closed 6 years ago
That's... I won't say impossible, but very odd. See for yourself: the changes in nbd-server
between then and now are limited to the transmission phase, and the error you see happens in the negotiation phase.
Can you provide steps to reproduce?
Sorry, that was miscommunication. It should read: That client connects successfully but makes the server fail on data transfer. That does not happen immediately, though. So for now I do not know the exact conditions to reproduce.
Yes, I understood that :-)
The message you're getting suggests that the server is trying to detect the size of the export in the way it does for exports rather than files. There is some code that tries to detect whether we're trying to export a file or a block device and then chooses the right path; and while bugs are of course always possible, fact is that they code has just not been touched.
That leaves me to conclude there are only a few possibilities:
nbd-server
to take a branch it shouldn't takeA bug that could cause the second kind of behavior is more likely to cause a segfault, so I think that's unlikely. The last behavior is possible, but you'd get some compiler warnings normally (and I don't think there are any left)
So, can you please send me a configuration file and a client command line that exhibits the behavior?
- PEBCAK
No. :-P
Looks like the error from server is just a result of client's failure. Took me some time to notice as I did not see the error message from nbd-client
started in initramfs.
On server side I have a simple named export with an iso file. On client I run:
nbd-client -N iso 172.31.255.254 /dev/nbd0
That connects successful, transfers some data (dd
raw data or mount
and cp
some files), then fails:
nbd,2079: Kernel call returned: Connection timed out
That message is from nbd-client.c
line 1272.
A bisect shows that a0f01c3f06130c5c02498209cccedbd16fe052a7 is the first bad commit, starting nbd-client
with -L
to disable netlink fixes the issue.
This looks like a kernel issue, then.
@josefbacik , what's your opinion on this?
I think I was running 4.15.10 (Arch Linux package linux 4.15.10-1) at that time.
Ugh sorry about that guys, that was a regression I introduced, I fixed it here
nbd: only set sndtimeo if we have a timeout set
Tho it's weird you hit the problem at all as the fix was a month after the problem was introduced, and was for sure in 4.15. Are you still able to reproduce the problem?
Currently our nbd package is compiled without netlink support. I will have to build a new package and test.
The change you reference is this one? nbd: only set sndtimeo if we have a timeout set
Tho it's weird you hit the problem at all as the fix was a month after the problem was introduced, and was for sure in 4.15. Are you still able to reproduce the problem?
This is a different issue caused by nbd-client adding a zero NBD_ATTR_TIMEOUT attribute to a NBD device configure request unconditionally, which sets the timeout to zero ticks in the kernel.
It is reproducible on the current linux-block.
Starting with version 3.17 (Arch Linux package 3.17-2) the server fails when the client connects:
strace
tells me this is callingioctl
forBLKGETSIZE64
:The device is a read-only image file, not a block device.
nbd-server
3.17 withnbd-client
3.16.2 is fine, so a change at client side (netlink?) is involved.