digint / btrbk

Tool for creating snapshots and remote backups of btrfs subvolumes
https://digint.ch/btrbk/
GNU General Public License v3.0
1.68k stars 122 forks source link

Unexpected crash after restore. #28

Closed TH3L0N3WOLF closed 9 years ago

TH3L0N3WOLF commented 9 years ago

Yesterday, I messed around with a mainline kernel as there was a bug in BTRFS prohibiting me from converting my setup from RAID0 to raid10. The conversion worked, but I lost power three quarters of the way through and corrupted the file system. So, I went ahead and re-created it, using the snapshots taken from my backup disk via btrbk. After a couple of hours of btrfs send/receive, I have a fully functional system again and thought I couldn't be happier, until I attempted to run btrbk to resume my backups.

Upon running, I get this error:

ERROR: process died unexpectedly (btrbk v0.19.2)
Please contact the author: Axel Burri <axel@tty0.ch>

Stack Trace:
----------------------------------------
Died at /usr/bin/btrbk line 904, <FILE> line 134.
 at /usr/bin/btrbk line 119, <FILE> line 134.
    main::__ANON__("Died at /usr/bin/btrbk line 904, <FILE> line 134.\x{a}") called at /usr/bin/btrbk line 904
    main::btr_tree(HASH(0x10c6790)) called at /usr/bin/btrbk line 956
    main::vinfo_subvol_list(HASH(0x10c6790)) called at /usr/bin/btrbk line 296
    main::vinfo_root(HASH(0x10c6790)) called at /usr/bin/btrbk line 1679

Running with btrbk -l trace run, the error is printed right after:

### btrfs subvolume list -a -c -u -q -R '/mnt/subvolroot'
... Command output:
ID 258 gen 2054 cgen 7 top level 5 parent_uuid - received_uuid abd4df05-160d-964b-9cd1-d7e97900ab43 uuid 33e0bc86-10f2-1d44-ae52-5bd716662491 path server
ID 295 gen 1929 cgen 52 top level 258 parent_uuid - received_uuid 40b49bc9-9a40-de4a-854e-9c8704f2e55f uuid f7b2eb3a-3b2f-fc4e-8912-a84fc4deed6c path <FS_TREE>/server/etc
ID 296 gen 2054 cgen 55 top level 5 parent_uuid - received_uuid 2286e4be-44fe-5344-9caf-cae657284e15 uuid b379e5bd-5fd4-8146-a5ac-82b2d48c2f3a path desktop
ID 324 gen 1044 cgen 119 top level 335 parent_uuid - received_uuid e32e9aa2-0eeb-bc47-8c85-e256d4c04265 uuid 92709d52-1f6e-0147-b367-0a9a37414ca6 path <FS_TREE>/server/var/lib/machines/guest_os
ID 332 gen 2054 cgen 143 top level 258 parent_uuid - received_uuid b94d353e-6292-7c44-bfa6-ff9944ecf535 uuid 72eae412-c5e4-a241-bbc8-f8b2e39830d5 path <FS_TREE>/server/var/lib/boinc
ID 335 gen 1044 cgen 152 top level 258 parent_uuid - received_uuid bab52f3b-cca5-6d41-b407-13e45fb34424 uuid c16575f9-d75e-874f-b87d-e754858274c3 path <FS_TREE>/server/var/lib/machines
ID 336 gen 1044 cgen 157 top level 296 parent_uuid - received_uuid 10ead242-eb04-234e-bbaa-41ebc1d7bde6 uuid a9776a67-1dd0-9c4f-a203-4b34b2f8841e path <FS_TREE>/desktop/home/[username]/temp
ID 350 gen 1044 cgen 198 top level 258 parent_uuid - received_uuid - uuid 50fb67cb-e42f-ad46-b86e-5811e7c1e7f6 path <FS_TREE>/server/srv/gameservers
ID 351 gen 1044 cgen 202 top level 350 parent_uuid - received_uuid - uuid 112d9f84-d858-724f-bbcf-d3c0a4ab68bf path <FS_TREE>/server/srv/gameservers/tekkit
Command execution successful
Parsed 9 total subvolumes for filesystem at: /mnt/subvolroot
... btr_tree: processing subvolume list of: /mnt/subvolroot

My btrbk config is as follows:

### GLOBALS ###
snapshot_create onchange
snapshot_preserve_daily 7
snapshot_preserve_weekly 4
snapshot_preserve_monthly 2

target_preserve_daily 28
target_preserve_weekly 16
snapshot_preserve_monthly all

btrfs_commit_delete        after
preserve_day_of_week       sunday
resume_missing             yes
incremental                yes

### END GLOBALS ###

### BEGIN LOCAL CONFIG ###

volume /mnt/subvolroot
    snapshot_dir backups
    subvolume  server
        target send-receive    /mnt/backups/Desktop/btrbk/server

    subvolume  server/srv/gameservers/tekkit 
        snapshot_dir server/srv/gameservers/backups
        target_preserve_daily 7
        target send-receive   /mnt/backups/Desktop/btrbk/tekkit

    subvolume server/etc
        target send-receive /mnt/backups/Desktop/btrbk/server

    subvolume  desktop
        target_preserve_weekly 40
        target send-receive    /mnt/backups/Desktop/btrbk/desktop

    subvolume server/var/lib/machines/guest_os
        snapshot_dir server/var/lib/machines/.snapshots
        target send-receive /mnt/backups/Desktop/btrbk/machines

### END LOCAL CONFIG ###

All snapshot directories and subvolumes appear to be accounted for. I'm not sure how I should proceed.

digint commented 9 years ago

Thanks for the bug report (sorry for the delay, I was in vacation :)

I traced the problem down to this line from btrfs subvolume list output:

ID 324 gen 1044 cgen 119 top level 335 parent_uuid - received_uuid e32e9aa2-0eeb-bc47-8c85-e256d4c04265 uuid 92709d52-1f6e-0147-b367-0a9a37414ca6 path <FS_TREE>/server/var/lib/machines/guest_os

The fun thing is that ID 324 has top level 335 (which is greater than 324) which I thought could never happen (and I'm still puzzled why it is like this on your restore scenario). btrbk reads this tree linearly and as 335 is not yet parsed, it fails there. Seems that I have to rewrite this function...

digint commented 9 years ago

@TH3L0N3WOLF can you confirm the fix on the "fix_subvol_list" branch is working for you?

https://github.com/digint/btrbk/tree/fix_subvol_list

TH3L0N3WOLF commented 9 years ago

@digint Confirmed, the fix_subvol_list branch is working perfectly. It was a really funky restore (using tools I threw into an initramfs and fixed remotely, woo!), so I wouldn't be surprised if I did something really weird in the process. Regardless, everything's working perfectly now.

EDIT: I hate that close issue button. I didn't even click it and it still acted on it's own.

digint commented 9 years ago

fixed in: 360deca5f2dfb26770a3753f403fd8674936896f