NetworkBlockDevice / nbd

Network Block Device
GNU General Public License v2.0
459 stars 119 forks source link

Timeout triggers too soon #71

Closed eworm-de closed 6 years ago

eworm-de commented 6 years ago

Starting with version 3.17 (Arch Linux package 3.17-2) the server fails when the client connects:

Connection dropped: Inappropriate ioctl for device

strace tells me this is calling ioctl for BLKGETSIZE64:

ioctl(7, BLKGETSIZE64, 0x7ffd58417048) = -1 ENOTTY (Inappropriate ioctl for device)

The device is a read-only image file, not a block device.

nbd-server 3.17 with nbd-client 3.16.2 is fine, so a change at client side (netlink?) is involved.

yoe commented 6 years ago

That's... I won't say impossible, but very odd. See for yourself: the changes in nbd-server between then and now are limited to the transmission phase, and the error you see happens in the negotiation phase.

Can you provide steps to reproduce?

eworm-de commented 6 years ago

Sorry, that was miscommunication. It should read: That client connects successfully but makes the server fail on data transfer. That does not happen immediately, though. So for now I do not know the exact conditions to reproduce.

yoe commented 6 years ago

Yes, I understood that :-)

The message you're getting suggests that the server is trying to detect the size of the export in the way it does for exports rather than files. There is some code that tries to detect whether we're trying to export a file or a block device and then chooses the right path; and while bugs are of course always possible, fact is that they code has just not been touched.

That leaves me to conclude there are only a few possibilities:

A bug that could cause the second kind of behavior is more likely to cause a segfault, so I think that's unlikely. The last behavior is possible, but you'd get some compiler warnings normally (and I don't think there are any left)

So, can you please send me a configuration file and a client command line that exhibits the behavior?

eworm-de commented 6 years ago
  • PEBCAK

No. :-P

Looks like the error from server is just a result of client's failure. Took me some time to notice as I did not see the error message from nbd-client started in initramfs.

On server side I have a simple named export with an iso file. On client I run:

nbd-client -N iso 172.31.255.254 /dev/nbd0

That connects successful, transfers some data (dd raw data or mount and cp some files), then fails:

nbd,2079: Kernel call returned: Connection timed out

That message is from nbd-client.c line 1272.

A bisect shows that a0f01c3f06130c5c02498209cccedbd16fe052a7 is the first bad commit, starting nbd-client with -L to disable netlink fixes the issue.

yoe commented 6 years ago

This looks like a kernel issue, then.

@josefbacik , what's your opinion on this?

eworm-de commented 6 years ago

I think I was running 4.15.10 (Arch Linux package linux 4.15.10-1) at that time.

josefbacik commented 6 years ago

Ugh sorry about that guys, that was a regression I introduced, I fixed it here

nbd: only set sndtimeo if we have a timeout set

Tho it's weird you hit the problem at all as the fix was a month after the problem was introduced, and was for sure in 4.15. Are you still able to reproduce the problem?

eworm-de commented 6 years ago

Currently our nbd package is compiled without netlink support. I will have to build a new package and test.

The change you reference is this one? nbd: only set sndtimeo if we have a timeout set

maciejsszmigiero commented 6 years ago

Tho it's weird you hit the problem at all as the fix was a month after the problem was introduced, and was for sure in 4.15. Are you still able to reproduce the problem?

This is a different issue caused by nbd-client adding a zero NBD_ATTR_TIMEOUT attribute to a NBD device configure request unconditionally, which sets the timeout to zero ticks in the kernel.

It is reproducible on the current linux-block.