Closed alkisg closed 6 years ago
Ubuntu uses patched kernels with various backports, so the numbers were misleading. I updated the issue description to reflect the mainline (vanilla) numbers instead. Just for completeness, the issue happens: In Ubuntu's kernels: after 4.8.0-58 and before 4.10.0-14 In mainline kernels: after 4.11.12 and before 4.12.0
I also saw that to make things 100% reproducible, I needed to call udevadm settle
, so my test case now is:
modprobe nbd udevadm settle nbd-client server-ip -N /opt/ltsp/i386 /dev/nbd5 udevadm settle dmesg | grep nbd
I also tested on Debian Stretch:
Are you saying that the problem does not exist in the most recent kernels you could find? Or do I misunderstand you there?
Hi Wouter, let me phrase it better,
the problem started with kernel 4.12 and is still happening in the most recent kernel I could find, which was 4.14-rc2.
Oh, okay then.
@josefbacik, any idea?
Oops, I'll take a look in the morning.
Oh actually I think this is my timeout patch that I fixed later, can you try Linus master?
Hi Josef, I tried the latest vanilla kernel from Ubuntu's daily builds and the problem still happens there: http://kernel.ubuntu.com/~kernel-ppa/mainline/daily/current/ cod/tip/daily/2017-10-05 42b76d0e6b1fe0fcb90e0ff6b4d053d50597b031
I.e. Linux master https://github.com/torvalds/linux/commit/42b76d0e6b1fe0fcb90e0ff6b4d053d50597b031
Alright got it nailed down, sorry about that, apparently all of my regression tests only do the netlink interface, save the one that checks that the ioctl and netlink interfaces behave with eachother. I've submitted the patch
[PATCH] nbd: don't set the device size until we're connected
to fix it and cc'ed stable so it'll make its way back to distro kernels. @yoe sorry, I thought I had updated the mailinglist but accidentally sent it to the old sf list. I've fixed my stuff so that won't happen again.
Thanks a lot! I'll try to detect when it lands on daily builds, so that I can test it.
I tested on Ubuntu 18.04 with the upcoming 4.15 kernel and it works fine there. Closing, thank you. :)
I still can reproduce this problem in ubuntu 1804, kernel 4.15.0-36-generic: cmd to reproduce:
qemu-img create -f qcow2 sample.img 10G
modprobe nbd
udevadm settle
qemu-nbd -c /dev/nbd0 sample.img
udevadm settle
Oct 15 11:24:21 i-uvwxiqu9 kernel: [ 44.003028] block nbd0: NBD_DISCONNECT
Oct 15 11:24:21 i-uvwxiqu9 kernel: [ 44.003147] block nbd0: shutting down sockets
Oct 15 11:24:21 i-uvwxiqu9 kernel: [ 44.005229] nbd0: detected capacity change from 0 to 10737418240
Oct 15 11:24:21 i-uvwxiqu9 kernel: [ 44.005296] print_req_error: 2 callbacks suppressed
Oct 15 11:24:21 i-uvwxiqu9 kernel: [ 44.005297] print_req_error: I/O error, dev nbd0, sector 0
Oct 15 11:24:21 i-uvwxiqu9 kernel: [ 44.006937] buffer_io_error: 2 callbacks suppressed
Oct 15 11:24:21 i-uvwxiqu9 kernel: [ 44.006939] Buffer I/O error on dev nbd0, logical block 0, async page read
Oct 15 11:24:21 i-uvwxiqu9 kernel: [ 44.009225] print_req_error: I/O error, dev nbd0, sector 0
Oct 15 11:24:21 i-uvwxiqu9 kernel: [ 44.010670] Buffer I/O error on dev nbd0, logical block 0, async page read
Oct 15 11:24:21 i-uvwxiqu9 kernel: [ 44.012777] print_req_error: I/O error, dev nbd0, sector 0
Oct 15 11:24:21 i-uvwxiqu9 kernel: [ 44.014259] Buffer I/O error on dev nbd0, logical block 0, async page read
Oct 15 11:24:21 i-uvwxiqu9 kernel: [ 44.015965] ldm_validate_partition_table(): Disk read failed.
Oct 15 11:24:21 i-uvwxiqu9 kernel: [ 44.015973] print_req_error: I/O error, dev nbd0, sector 0
Oct 15 11:24:21 i-uvwxiqu9 kernel: [ 44.018006] Buffer I/O error on dev nbd0, logical block 0, async page read
Oct 15 11:24:21 i-uvwxiqu9 kernel: [ 44.020183] print_req_error: I/O error, dev nbd0, sector 0
Oct 15 11:24:21 i-uvwxiqu9 kernel: [ 44.021638] Buffer I/O error on dev nbd0, logical block 0, async page read
Oct 15 11:24:21 i-uvwxiqu9 kernel: [ 44.023379] print_req_error: I/O error, dev nbd0, sector 0
Oct 15 11:24:21 i-uvwxiqu9 kernel: [ 44.024763] Buffer I/O error on dev nbd0, logical block 0, async page read
Oct 15 11:24:21 i-uvwxiqu9 kernel: [ 44.026803] print_req_error: I/O error, dev nbd0, sector 0
Oct 15 11:24:21 i-uvwxiqu9 kernel: [ 44.028193] Buffer I/O error on dev nbd0, logical block 0, async page read
Oct 15 11:24:21 i-uvwxiqu9 kernel: [ 44.030244] Dev nbd0: unable to read RDB block 0
Oct 15 11:24:21 i-uvwxiqu9 kernel: [ 44.031491] print_req_error: I/O error, dev nbd0, sector 0
Oct 15 11:24:21 i-uvwxiqu9 kernel: [ 44.032871] Buffer I/O error on dev nbd0, logical block 0, async page read
Oct 15 11:24:21 i-uvwxiqu9 kernel: [ 44.034815] print_req_error: I/O error, dev nbd0, sector 0
Oct 15 11:24:21 i-uvwxiqu9 kernel: [ 44.036188] Buffer I/O error on dev nbd0, logical block 0, async page read
Oct 15 11:24:21 i-uvwxiqu9 kernel: [ 44.038083] print_req_error: I/O error, dev nbd0, sector 24
Oct 15 11:24:21 i-uvwxiqu9 kernel: [ 44.039485] Buffer I/O error on dev nbd0, logical block 3, async page read
Oct 15 11:24:21 i-uvwxiqu9 kernel: [ 44.041432] nbd0: unable to read partition table
@chenhaiq I also see this problem. It is a bug in the nbd kernel module in nbd_bdev_reset()
where it sets the size to 0 but the capacity is unchanged, then it calls blkdev_reread_part()
. blkdev_reread_part()
detects a capacity change and attempts to read the partitions on the nbd device shutting down, causing I/O errors and the unable to read partition table
message.
This was fixed in 4.18 in this commit: https://github.com/torvalds/linux/commit/fe1f9e6659ca6124f500a0f829202c7c902fab0c#diff-bc9273bcb259fef182ae607a1d06a142
@josefbacik Can this fix be backported to 4.14 stable?
Using Ubuntu 16.04, nbd-server and client=1:3.13-1.
a) With mainline (vanilla) kernel 4.11.12 on the client, the commands below run fine without errors. b) Then I upgrade to mainline kernel 4.12.0, and I run again:
And I see the following errors:
I can reproduce this in many installations, real or VMs. Thanks!