digint / btrbk

Tool for creating snapshots and remote backups of btrfs subvolumes
https://digint.ch/btrbk/
GNU General Public License v3.0
1.58k stars 116 forks source link

failed to clone extents #565

Open Massimo-B opened 8 months ago

Massimo-B commented 8 months ago

Hi, btrbk sending to ssh targets often fails due to "failed to clone extents".

How can I fix that, how can I find the failing subvolume to drop and re-transfer? I tried the btrfs send ... >/dev/null only on the remote machine, that works without errors.

Creating incremental backup...
Creating backup: /mnt/local/data/archive/mmu/root/root.20231031T081700+0100
[send/receive] target: /mnt/local/data/archive/mmu/root/root.20231031T081700+0100
[send/receive] source: mmu:/mnt/btrfs-top-lvl/snapshots/root/root.20231031T081700+0100
[send/receive] parent: mmu:/mnt/btrfs-top-lvl/snapshots/root/root.20231023T153709+0200
in @  0.0 kiB/s, out @  0.0 kiB/s,  156 MiB total, buffer 100% full
summary:  156 MiByte in  1min 50.1sec - average of 1451 kiB/s
[send/receive] checking target metadata: /mnt/local/data/archive/mmu/root/root.20231031T081700+0100
ERROR: Failed to send/receive subvolume: mmu:/mnt/btrfs-top-lvl/snapshots/root/root.20231031T081700+0100 [/mnt/btrfs-top-lvl/snapshots/root/root.20231023T153709+0200] -> /mnt/local/data/archive/mmu/root/root.20231031T081700+0100
ERROR: ... Command execution failed (exitcode=1)
ERROR: ... sh: ssh -i '/root/.ssh/id_ed25519' -o compression=no root@mmu 'btrfs send -p '\''/mnt/btrfs-top-lvl/snapshots/root/root.20231023T153709+0200'\'' --proto 2 --compressed-data '\''/mnt/btrfs-top-lvl/snapshots/root/root.20231031T081700+0100'\'' | mbuffer -v 1 -q -m 5% | lz4 -c' | lz4 -d -c | mbuffer -v 1 -m 5% | btrfs receive '/mnt/local/data/archive/mmu/root/'
ERROR: ... failed to clone extents to usr/lib/python3.11/site-packages/numpy/distutils/__pycache__/ccompiler_opt.cpython-311.opt-1.pyc: Invalid argument
[delete] target: /mnt/local/data/archive/mmu/root/root.20231031T081700+0100
WARNING: Deleted partially received (garbled) subvolume: /mnt/local/data/archive/mmu/root/root.20231031T081700+0100
ERROR: Error while resuming backups, aborting
Created 0/2 missing backups
btrbk --version
btrbk command line client, version 0.32.6-dev
Massimo-B commented 8 months ago

Scrub on source and target btrfs found no errors.

Zygo commented 8 months ago

My guess is that it doesn't like something in the parent file, like the "inline extent followed by regular extents" pattern. There's a short list of things that can make clone return EINVAL (and why they shouldn't happen):

  1. inline extents that aren't at EOF (recent kernels handle the EOF case, but historically the non-EOF case trips things up, and there's at least one EINVAL case in fill_holes)
  2. non-aligned start of range, overflows (send should know better than to emit those)
  3. past EOF on src file (receiver's parent might be different from sender's parent, can only happen if the subvol was modified on the receive side, or a previous send/receive bug)
  4. non-aligned end of range that is not EOF on src file (send should know better than to emit those)
  5. non-aligned EOF length mismatches (send should know better than to emit those, also only applies to dedupe not clone)
  6. different NODATASUM bits in the src and dst inodes (does btrfs receive replicate those bits? Send should know better)
  7. overlapping src/dst offset range in the same inode (send should know better)

Can you:

  1. reproduce this with btrfs receive -vv to get debug messages
  2. look for the clone source just before the error (the message has the pattern clone %s - source=%s source offset=%llu offset=%llu length=%llu),
  3. run btrfs-search-metadata file on all versions of the clone source file (parent and current subvol, src and dst filesystem)
  4. and run btrfs-search-metadata file on the clone destination file too (that's usr/lib/python3.11/site-packages/numpy/distutils/__pycache__/ccompiler_opt.cpython-311.opt-1.pyc from the error message)
Massimo-B commented 8 months ago
1. reproduce this with `btrfs receive -vv` to get debug messages

Ok, for the current failure:

ERROR: ... sh: ssh -i '/root/.ssh/id_ed25519' -o compression=no root@mmu 'btrfs send -p '\''/mnt/btrfs-top-lvl/snapshots/root/root.20230904T150800+0200'\'' --proto 2 --compressed-data '\''/mnt/btrfs-top-lvl/snapshots/root/root.20230925T072800+0200'\'' | mbuffer -v 1 -q -m 5% | lz4 -c' | lz4 -d -c | mbuffer -v 1 -m 5% | btrfs receive '/mnt/usb/mobiledata/snapshots/mmu/root/'
ERROR: ... failed to clone extents to usr/lib/python3.11/site-packages/numpy/core/tests/__pycache__/test_multiarray.cpython-311.opt-1.pyc: Invalid argument

I tried with -vv on the receiver:

mkfile o13295363-195048-0
rename o13295363-195048-0 -> usr/lib/python3.11/site-packages/numpy/distutils/__pycache__/ccompiler_opt.cpython-311.opt-2.pyc
utimes usr/lib/python3.11/site-packages/numpy/distutils/__pycache__
write usr/lib/python3.11/site-packages/numpy/distutils/__pycache__/ccompiler_opt.cpython-311.opt-2.pyc - offset=0 length=4096
encoded_write usr/lib/python3.11/site-packages/numpy/distutils/__pycache__/ccompiler_opt.cpython-311.opt-2.pyc - offset=4096, len=40960, unencoded_offset=4096, unencoded_file_len=89973, unencoded_len=94208, compression=2, encryption=0
chown usr/lib/python3.11/site-packages/numpy/distutils/__pycache__/ccompiler_opt.cpython-311.opt-2.pyc - uid=0, gid=0
chmod usr/lib/python3.11/site-packages/numpy/distutils/__pycache__/ccompiler_opt.cpython-311.opt-2.pyc - mode=0664
utimes usr/lib/python3.11/site-packages/numpy/distutils/__pycache__/ccompiler_opt.cpython-311.opt-2.pyc
mkfile o13295364-195048-0
rename o13295364-195048-0 -> usr/lib/python3.11/site-packages/numpy/distutils/__pycache__/ccompiler_opt.cpython-311.opt-1.pyc
utimes usr/lib/python3.11/site-packages/numpy/distutils/__pycache__
write usr/lib/python3.11/site-packages/numpy/distutils/__pycache__/ccompiler_opt.cpython-311.opt-1.pyc - offset=0 length=4096
clone usr/lib/python3.11/site-packages/numpy/distutils/__pycache__/ccompiler_opt.cpython-311.opt-1.pyc - source=usr/lib/python3.11/site-packages/numpy/distutils/__pycache__/ccompiler_opt.cpython-311.opt-1.pyc source offset=4096 offset=4096 length=16384
ERROR: failed to clone extents to usr/lib/python3.11/site-packages/numpy/distutils/__pycache__/ccompiler_opt.cpython-311.opt-1.pyc: Invalid argument

summary:  210 MiByte in  8min 28.0sec - average of  423 kiB/s
Massimo-B commented 8 months ago

For btrfs-search-metadata do I need to install python-btrfs?