calestyo commented 7 years ago

Hey.

I've been trying the following on Debian Sid ( @yoe ... pingin Wouter who's the maintainer there):

Debian nbd version 1:3.15.1-2 Linux heisenberg 4.8.0-2-amd64 #1 SMP Debian 4.8.15-2 (2017-01-04) x86_64 GNU/Linux

8TB SATA HDD connected via SATA/USB bridge to the host heisenberg
partition 2 is a LUKS container, which is mapped ("decrypted")
the "decrypted" device is 8000448233472 bytes in size (~8TB) and it's owner set to nbd:nbd
heisenberg works as NBD server and exports the "decrypted" device to localhost
another Debian sid (same kernel, nbd versions) run inside a kvm on heisenberg, named "klenze"
there's a port forwarding from between the two hosts for the nbd port
I did a blockdev --setro /dev/sdb* (with sdb being the SATA disk)
I tried with both, the dm-crypt mapping set up with --readonly and without

/etc/nbd-server/config:


[generic]
# If you want to run everything as root rather than the nbd user, you
# may either say "root" in the two following lines, or remove them
# altogether. Do not remove the [generic] section, however.
user = nbd
group = nbd
includedir = /etc/nbd-server/conf.d
listenaddr = 127.0.0.1
max_threads = 1
allowlist = true

What follows are export definitions. You may create as much of them as

you want, but the section header has to be unique.

[calestyo] exportname = /dev/mapper/data-a3 readonly = true rotational = true


Now first, the whole setup works fine with a smaller test file (e.g. a 1GiB image file containing an ext4 can be mounted on the client).

When I now try to connect the NBD device on the client with:

nbd-client localhost -N calestyo /dev/nbd0

Negotiation: ..size = 7629822MB bs=1024, sz=8000448233472 bytes

That seems to work nicely... the server shows something like:

Jan 14 02:53:42 heisenberg nbd-server[28682]: Stopping Network Block Device server: nbd-server. Jan 14 02:53:42 heisenberg nbd-server[28685]: nbd-server. Jan 14 02:53:46 heisenberg nbd_server[28688]: Spawned a child process Jan 14 02:53:46 heisenberg nbd_server[28691]: virtstyle ipliteral Jan 14 02:53:46 heisenberg nbd_server[28691]: connect from 127.0.0.1, assigned file is /dev/mapper/data-a3 Jan 14 02:53:46 heisenberg nbd_server[28691]: No authorization file, granting access. Jan 14 02:53:46 heisenberg nbd_server[28691]: Starting to serve Jan 14 02:53:46 heisenberg nbd_server[28691]: Size of exported file/device is 8000448233472 Jan 14 02:55:54 heisenberg nbd_server[28688]: Child exited with 0 Jan 14 02:55:56 heisenberg nbd-server[28808]: Stopping Network Block Device server: nbd-server.


Now unfortunately,...
blkid /dev/nbd0
gives nothing, as does e.g. hd /dev/nbd0

and worse:

blockdev --getsize64 /dev/nbd0

18446743278064762880



Any ideas?

Cheers,
Chris.

calestyo commented 7 years ago

Oh and when I manually truncate the exported size (but still use the "decrypted" device) to e.g. 1TiB it still works (but of course the btrfs is unusable then),... for 2TiB it already fails IIRC.

abligh commented 7 years ago

That's all a bit strange. There were some recent kernel patches ( https://www.spinics.net/lists/linux-block/msg07060.html ) which I'm guessing aren't in the kernel you are using yet to support enormous devices (over 40TB) but I wouldn't have thought 1TB would be an issue.

What would be really helpful is to know whether this is an nbdserver issue or an nbdclient / linux kernel issue. One way of determining the difference would be to use qemu-img (which has its own nbd-client in) to see if it can get the size right (with qemu-img info). As you say you already have kvm installed, any chance you could take a look at that and report back?

I'd also be keen to know whether what you are doing causes problems with a much simpler server setup, e.g. a single 1TB+ file (rather than all the luks stuff). You can use a sparse file (e.g. use dd like this: http://prefetch.net/blog/index.php/2009/07/05/creating-sparse-files-on-linux-hosts-with-dd/ ) so you won't need much disk space at all - perhaps the problem is nbd-server reading the size of the partition.

calestyo commented 7 years ago

The patch from Josef doesn't seem to be part of 4.9.0 (to which I've updated just yesterday night).

Also I've said, 1TiB still worked,... it was just something between 1 and 2 TiB when it stopped working.

As for trying qemu's nbd... What I did now (and correct me if you wanted something else): I used qemu's nbd server to export the device:

# qemu-nbd --read-only -f raw /dev/dm-1  -c /dev/nbd0
# blockdev --getsize64 /dev/nbd0 
18446743278064762880
# qemu-img info /dev/nbd0
qemu-img: Could not open '/dev/nbd0': Could not refresh total sector count: Invalid argument

whereas after disconnecting (qemu-nbd -d /dev/nbd0)

# blockdev --getsize64 /dev/nbd0 
0
# qemu-img info /dev/nbd0
image: /dev/nbd0
file format: raw
virtual size: 0 (0 bytes)
disk size: 0

(all the above within the same host context, heisenberg, again the "decrypted" device being used, but nothing below qemu-nbd set ro (i.e. neither at blockdev nor cryptsetup level, as qemu-nbd complained then)

So it seems to be either a bug in both implementations or the kernel?

As for trying the whole thing with a sparse file:

# truncate --size=10t /tmp/img
# mkfs.btrfs -L foobar /tmp/img
btrfs-progs v4.7.3
See http://btrfs.wiki.kernel.org for more information.

Label:              foobar
UUID:               
Node size:          16384
Sector size:        4096
Filesystem size:    10.00TiB
Block group profiles:
  Data:             single            8.00MiB
  Metadata:         DUP               1.00GiB
  System:           DUP               8.00MiB
SSD detected:       no
Incompat features:  extref, skinny-metadata
Number of devices:  1
Devices:
   ID        SIZE  PATH
    1    10.00TiB  /tmp/img

# systemctl restart nbd-server.service

(same nbd-server config as yesterday, just different pathname)

On the client size (again within the VM again ssh-forwarded ports):

# nbd-client   localhost -N calestyo /dev/nbd0
Negotiation: ..size = 10485760MB
bs=1024, sz=10995116277760 bytes
# blockdev --getsize64 /dev/nbd0 
18446741874686296064
# blkid /dev/nbd0 
#

Cheers, Chris.

abligh commented 7 years ago

So on kernel 3.13 I get:

root@nimrod-ubuntu:/home/amb# truncate --size=10t /tmp/img
root@nimrod-ubuntu:/home/amb# qemu-nbd --read-only -f raw /tmp/img  -c /dev/nbd0
root@nimrod-ubuntu:/home/amb# blockdev --getsize64 /dev/nbd0
10995116277760

which is the correct answer. I suspect if you try that entirely self-contained test, you'll get the wrong answer on your kernel. Could you confirm?

The qemu-img test I meant works something like this:

In one window start an NBD server with a 10T disk (or use the normal NBD server to do it).

# truncate --size=10t /tmp/img
# qemu-nbd --read-only -f raw /tmp/img -p 12345

In another window, connect to it with qemu-img (to avoid the kernel and /dev/nbdX):

 # qemu-img info -f nbd nbd://127.0.0.1:12345/
 image:
 file format: nbd
 virtual size: 10T (10995116277760 bytes)
 disk size: unavailable

I suspect if you try that, it will work fine (as it did above).

If i'm right so far, then the problem is the kernel, which would be the only common factor. Looks like there was some form of kernel regression between 3.13 and the patch I indicated.

calestyo commented 7 years ago

Hmm interestingly, the first test of yours gives me now:

root@heisenberg:~# truncate --size=10t /tmp/img
root@heisenberg:~# qemu-nbd --read-only -f raw /tmp/img  -c /dev/nbd0
root@heisenberg:~# blockdev --getsize64 /dev/nbd0
0

Your second test: terminal 1:

root@heisenberg:~# qemu-nbd -d /dev/nbd0
/dev/nbd0 disconnected
root@heisenberg:~# qemu-nbd --read-only -f raw /tmp/img -p 12345

terminal 2:

# qemu-img info -f nbd nbd://127.0.0.1:12345/
image: nbd://127.0.0.1:12345
file format: nbd
virtual size: 10T (10995116277760 bytes)
disk size: unavailable

points again towards the kernel, doesn't it?

So what's next? You guys also in charge of the kernel nbd driver?

abligh commented 7 years ago

Yes, this is looking like a kernel bug, quite possibly the one fixed by the patch I referenced. For anyone else reading this:

Bug happens using completely separate code-base (using qemu as server and qemu as client to wire up kernel)
Bug doesn't happen if kernel not involved, i.e. using nbd-server and qemu-img as the client.

This isn't the correct place to report kernel bugs technically (this is the github repo for the userspace nbd components). You want some or all of:

linux-block@vger.kernel.org
nbd-general@lists.sourceforge.net
linux-kernel@vger.kernel.org

What would be really good is if you could test Josef's patch and see if it fixes your issue. If so, I'm happy to help you get the fix into the kernel (I'm a committer on the userspace side).

calestyo commented 7 years ago

What would be really good is if you could test Josef's patch and see if it fixes your issue. If so, I'm happy to help you get the fix into the kernel (I'm a committer on the userspace side).

That would anyway only be necessary on the client side's kernel, right?

calestyo commented 7 years ago

https://www.spinics.net/lists/linux-block/msg07883.html https://sourceforge.net/p/nbd/mailman/message/35604275/

calestyo commented 7 years ago

Seems Josef's patch works. Not sure if anything on the nbd-client side could be done that in case of a buggy kernel/too large device, an error would be given, rather than going on as if everything would be fine. If not, we can probably close the bug.

abligh commented 7 years ago

Yes, only necessary on the client side, and I'm glad the kernel patch fixes it. Closing this.

yoe commented 7 years ago

@calestyo: since you're running Debian unstable, might be good to file a bug about this against the linux package in Debian, so that the fix gets in the upcoming Debian stable.

calestyo commented 7 years ago

Done, http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=851533 .

Cheers.

NetworkBlockDevice / nbd

exported files over something around 1 TiB get an insane device size on the client side and are actually empty #44

What follows are export definitions. You may create as much of them as

you want, but the section header has to be unique.

nbd-client localhost -N calestyo /dev/nbd0

blockdev --getsize64 /dev/nbd0