elemental-lf / benji

Benji Backup: A block based deduplicating backup software for Ceph RBD images, iSCSI targets, image files and block devices
https://benji-backup.me
Other
136 stars 43 forks source link

NBD export fails to mount with default blocksize #12

Closed olifre closed 5 years ago

olifre commented 5 years ago

Trying to mount any backup via NBD:

backup-host # benji nbd -r
desktop-machine # nbd-client -N V0000000002 localhost -p 10809 /dev/nbd0
Negotiation: ..size = 10240MB
bs=1024, sz=10737418240 bytes
desktop-machine # LANG=C mount -t ext4 -o ro,norecovery /dev/nbd0p1 mnt
mount: /home/olifre/mnt: wrong fs type, bad option, bad superblock on /dev/nbd0p1, missing codepage or helper program, or other error.

Interestingly, syslog on desktop-machine contains:

[2541735.786008]  nbd0: p1
[2541735.786062] nbd0: p1 size 41938944 extends beyond EOD, truncated
[2541753.379898] EXT4-fs (nbd0p1): VFS: Can't find ext4 filesystem

Trying partprobe (but that seems to happen automatically on newer systems?):

desktop-machine # LANG=C partprobe /dev/nbd0
Error: Can't have a partition outside the disk!

backup-host is a CentOS 7 machine, so I don't have NBD kernel modules :sad:. desktop-machine is a modern kernel 4.19 Gentoo machine with:

nbd-tools version: 3.15.3
kernel: 4.19

Retrying with:

nbd-client -N V0000000002 localhost -b 512 -p 10809 /dev/nbd0

works completely well, though. I can reproduce the same with Backy2 by the way.

Is this some configuration issue / version problem on my end? The documentation also contains the example

# modprobe nbd
# nbd-client -N V0000000001 127.0.0.1 -p 10809 /dev/nbd0
Negotiation: ..size = 10MB
bs=1024, sz=10485760 bytes

so I presume this should theoretically work with other block sizes than 512?

olifre commented 5 years ago

I re-tried with nbd tools version 3.18. With that, the following happens on Benji's end:

    INFO: Incoming connection from 127.0.0.1:54112
    INFO: [127.0.0.1:54112] Negotiated export: V0000000002
    INFO: [127.0.0.1:54112] Version V0000000002 has been opened.
    INFO: [127.0.0.1:54112] Export is read only.
   ERROR: [127.0.0.1:54112] 0 bytes read on a total of 28 expected bytes

and the following on the client:

# nbd-client -N V0000000002 localhost -b 512 -p 10809 /dev/nbd0
Negotiation: ..size = 10240MB
bs=512, sz=10737418240 bytes

That looks fine at first glance, but syslog says:

[2542608.243309] block nbd0: NBD_DISCONNECT
[2542610.544813] block nbd0: Wrong magic (0x0)
[2542610.544920] nbd0: detected capacity change from 0 to 10737418240
[2542610.544930] block nbd0: shutting down sockets
[2542610.544965] print_req_error: I/O error, dev nbd0, sector 0
[2542610.544967] Buffer I/O error on dev nbd0, logical block 0, async page read
[2542610.545001]  nbd0: unable to read partition table

I'm unsure whether this is just a bug in the client tools, though...

olifre commented 5 years ago

Testing with 3.16.2 (and blocksize 512) is also fine, but with 3.17 and 3.18, things fail as described.

So we actually have two issues here:

elemental-lf commented 5 years ago

First of all it's designed to work with all block sizes up to 4096. I have run my local tests on a Fedora box with nbd 3.17, kernel 4.18/4.19 and the default block size (1024). Could you try accessing the NBD device while running benji nbd with --log-level DEBUG and provide the output?

olifre commented 5 years ago

First of all it's designed to work with all block sizes up to 4096. I have run my local tests on a Fedora box with nbd 3.17, kernel 4.18/4.19 and the default block size (1024).

I have a theory concerning the block size issue. Were your tests done with a raw volume, e.g. a filesystem directly on the RBD volume, or was there an MBR partitioning on it? In my case, the RBD volume and hence also the backup contains a partition which contains the FS. My guess is that, since in MBR partitioning there are only "blocks", the problem actually is unrelated to NBD: If the partitions were created on a system assuming block size 512 (e.g. a VM which used the RBD) and then exported via NBD which defaults to 1024, the interpretation of the MBR partitioning header will be broken since the number of blocks written there is now interpreted as a different number of bytes due to the larger logical blocks.

This would explain: nbd0: p1 size 41938944 extends beyond EOD, truncated easily. So it would not be a Benji nor an NBD issue, but just something one might easily do wrong ;-).

However, I have just done a test of NBD 3.18 again, using a 4.20.2 kernel on Gentoo. This is broken due to a different issue as we saw before. I got the following with --log-level DEBUG:

   DEBUG: Configuration for module benji.storage.file: {'path': '/vmbackup/benji/data', 'activeTransforms': ['zstd'], 'simultaneousWrites': 5, 'simultaneousReads': 5, 'bandwidthRead': 0, 'bandwidthWrite': 0, 'consistencyCheckWrites': False}.
   DEBUG: Using selector: EpollSelector
    INFO: Starting to serve NBD on 127.0.0.1:10809
    INFO: Incoming connection from 127.0.0.1:56044
   DEBUG: [127.0.0.1:56044]: opt=7, len=9, data=b'\x00\x00\x00\x03200\x00\x00'
   DEBUG: [127.0.0.1:56044]: opt=1, len=3, data=b'200'
    INFO: [127.0.0.1:56044] Negotiated export: V0000000200
    INFO: [127.0.0.1:56044] Version V0000000200 has been opened.
    INFO: [127.0.0.1:56044] Export is read only.
   ERROR: [127.0.0.1:56044] 0 bytes read on a total of 28 expected bytes

While nbd-client only outputs:

Negotiation: ..size = 5120MB
bs=512, sz=5368709120 bytes

here's what I get in the syslog on the client:

[139539.473809] block nbd0: Wrong magic (0x0)
[139539.474022] nbd0: detected capacity change from 0 to 5368709120
[139539.474043] block nbd0: shutting down sockets
<another attempt>
[139743.126225] block nbd0: Wrong magic (0x0)
[139743.126483] block nbd0: shutting down sockets
[139743.126502] nbd0: detected capacity change from 0 to 5368709120

Not sure why I missed that before, either it's the new kernel or I messed with different versions too much. Interestingly, nbd-client -l works perfectly fine.

elemental-lf commented 5 years ago

Were your tests done with a raw volume, e.g. a filesystem directly on the RBD volume, or was there an MBR partitioning on it? I only tested without partitioning. My knowledge in this area is a little sketchy but this sounds like a possible explanation. Another addition for the documentation...

olifre commented 5 years ago

I also got nbd client 3.17 to work just now on Gentoo! Really helpful you pointed me to Fedora. Checking their packaging sources at: https://dl.fedoraproject.org/pub/fedora/linux/releases/29/Everything/source/tree/Packages/n/nbd-3.17-3.fc29.src.rpm they apply a patch since nbd-client is apparently almost unusable since if no timeout value is passed, it uses a timeout of 0.

Just for testing, I called nbd-client -t 10 and it works fine...

So 3.17 has a bug. With 3.18, I still get:

block nbd0: Wrong magic (0x0)

But all this makes me believe this might be another bug in the NBD client...

Related to the blocksize issue; I found: https://github.com/NetworkBlockDevice/nbd/commit/128fd556286ff5d53c5f2b16c4ae5746b5268a64 just now. That seems to confirm my understanding ("the result is that partitions are parsed incorrectly, resulting in confusing behaviour") and new nbd-client default will be 512.

elemental-lf commented 5 years ago

Thanks for looking into this. The information is really helpful.

So 3.17 has a bug. With 3.18, I still get:

block nbd0: Wrong magic (0x0)

What is strange is that this message comes from the kernel and not the user space nbd-client program. I assume that you use the same nbd kernel module with both versions? This is the kernel code for 4.18.0:

if (ntohl(reply.magic) != NBD_REPLY_MAGIC) {
dev_err(disk_to_dev(nbd->disk), "Wrong magic (0x%lx)\n",
(unsigned long)ntohl(reply.magic));
return ERR_PTR(-EPROTO);
}

So we've left the negotiation phase (handled by nbd-client) and entered the transmission phase (handled by the kernel).

olifre commented 5 years ago

I assume that you use the same nbd kernel module with both versions?

Yes. I even explicitly unloaded it and reloaded it between the tests, to make sure there is no "memory" of any strange state in case the userspace part manages to trigger something awkward.

I was on Kernel 4.20.2 for the tests, but this part of the code did not change.

So we've left the negotiation phase (handled by nbd-client) and entered the transmission phase (handled by the kernel).

This indeed makes that a mystery. It's interesting to see that Debian Buster will likely have nbd client 3.18 (they don't apply any patches - yet), but that's probably caused by nbd upstream being close to Debian and self-maintaining it: https://packages.debian.org/buster/nbd-client I don't find any matching bug report with them (yet).

On Gentoo, both 3.17 and 3.18 are still marked as unstable. So not many people will be using 3.18 yet.

I have performed an strace of the benji NBD server, checking send and receive. Here's what I got for talking with nbd-client 3.17:

[pid 29384] sendto(9, "NBDMAGICIHAVEOPT\0\1", 18, 0, NULL, 0) = 18
[pid 29384] recvfrom(9, "\0\0\0\1IHAVEOPT\0\0\0\7\0\0\0\t\0\0\0\003200\0\0", 262144, 0, NULL, NULL) = 29
[pid 29384] sendto(9, "\0\3\350\211\4Ue\251\0\0\0\7\200\0\0\1\0\0\0\0", 20, 0, NULL, 0) = 20
[pid 29384] recvfrom(9, "IHAVEOPT\0\0\0\1\0\0\0\003200", 262144, 0, NULL, NULL) = 19
.. benji checks all uids etc. ...
[pid 29384] sendto(9, "\0\0\0\1@\0\0\0\0\7", 10, 0, NULL, 0) = 10
[pid 29384] sendto(9, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 124, 0, NULL, 0) = 124
[pid 29384] recvfrom(9, "%`\225\23\0\0\0\0\25\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\20\0", 262144, 0, NULL, NULL) = 28

And with 3.18:

[pid 29309] sendto(9, "NBDMAGICIHAVEOPT\0\1", 18, 0, NULL, 0) = 18
[pid 29309] recvfrom(9, "\0\0\0\1IHAVEOPT\0\0\0\7\0\0\0\t\0\0\0\003200\0\0", 262144, 0, NULL, NULL) = 29
[pid 29309] sendto(9, "\0\3\350\211\4Ue\251\0\0\0\7\200\0\0\1\0\0\0\0", 20, 0, NULL, 0) = 20
[pid 29309] recvfrom(9, "IHAVEOPT\0\0\0\1\0\0\0\003200", 262144, 0, NULL, NULL) = 19
.. benji checks all uids etc. ...
[pid 29309] sendto(9, "\0\0\0\1@\0\0\0\0\7", 10, 0, NULL, 0) = 10
[pid 29309] sendto(9, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 124, 0, NULL, 0) = 124
[pid 29309] recvfrom(9, "", 262144, 0, NULL, NULL) = 0

The response received from the nbd-client at the end seems empty... And that's also what Benji says:

ERROR: [127.0.0.1:54902] 0 bytes read on a total of 28 expected bytes

That makes me rather suspicious of nbd client 3.18 :wink:.

elemental-lf commented 5 years ago

Maybe it's a protocol incompatibility between the new nbd-client and Benji. The implementation in Benji is simplistic. Have you taken a look at the protocol specification? I have, yesterday evening. It's over 100kb in size and it sometimes feels like it contains more exceptions than rules. Have you tried using -g with nbd-client. There shouldn't be a problem without it but it's definitely one area where nbd-client has changed. I'm currently looking through the differences between 3.16.2 and master.

olifre commented 5 years ago

I have, yesterday evening. It's over 100kb in size and it sometimes feels like it contains more exceptions than rules.

Oh my...

Have you tried using -g with nbd-client.

I have, but nothing has changed with the flag, so at least that matches expectation.

I'm currently looking through the differences between 3.16.2 and master.

Since 3.17 works fine (for me) after setting a timeout, and works fine for you on Fedora which has patched the timeout issue, I think the issue / incompatible change must have crept in between 3.17 and 3.18. Even with this reduced version jump there appear to be a lot of commits...

elemental-lf commented 5 years ago

I've just pushed one commit which increases our specification conformity but I don't think that it will help. I'm closing this issue for now. Let's wait and see...

olifre commented 5 years ago

I'm closing this issue for now. Let's wait and see...

I agree, still many thanks for spending time with this, I know that reading specs like these are in fact risking your own sanity :wink:. I'll still give the changes a spin sometime tomorrow and let you know in case anything changed.

olifre commented 5 years ago

Just for the record, in case somebody ends up here: I re-tried with 3.18 with the new commit applied to Benji and it does not change anything.

wehde92 commented 5 years ago

Just letting you know that this issue still seems to be there. I've tried using nbd-client version 3.18 and manually setting the timeout and blocksize options, but have not been able to get it to work. This same issue seems to plague backy2 as well. For reference, we are running 5.x kernels and our environment runs on ceph version 12.4.1. The partition structure on my test instance is just a simple rbd volume formatted ext4. Downgrading nbd-client to 3.16 and using the -b 512 -t 10 flags makes things work.

wehde92 commented 5 years ago

After digging some more I finally found a work around. Compiling nbd-client from the github repo that actually has version 3.19 seems to work fine without any block/timeout flags. I'm going to assume this issue will be resolved once the debian/ubuntu repos catch up the 3.19 update things should work fine.

elemental-lf commented 5 years ago

@wehde92 Thanks for the feedback on this. Polishing up the NBD of Benji is still on my long list of things to do but in this case it seems more like an nbd-client problem. But maybe it can be worked around in Benji's NBD server.

timlee66 commented 7 months ago

Hi @elemental-lf @olifre @wehde92,

Could you help us to look into this issue whether is relate to this? Due to when we use default block size 512 then mount and installation iso can complete, but other than 512 we found that "nbd0: truncated" error message then mount/installation fail. Thanks for your time to look into this in advance!

Best regards, Tim