digint / btrbk

Tool for creating snapshots and remote backups of btrfs subvolumes
https://digint.ch/btrbk/
GNU General Public License v3.0
1.6k stars 117 forks source link

btrbk: add bzip3 compression #501

Closed calestyo closed 1 year ago

calestyo commented 1 year ago

bzip3 is now in Debian sid and shows quite remarkable performance.

Compressing the contents of the decompressed linux-source-6.0 6.0.8-1’s /usr/src/linux-source-6.0.tar.xz with 1 thread and xz gives:

$ tar -cf - linux-source-6.0 | /usr/bin/time --verbose xz -T 1 > xz.1.xz
    Command being timed: "xz -T 1"
    User time (seconds): 314.96
    System time (seconds): 0.50
    Percent of CPU this job got: 99%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 5:15.48
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 97588
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 0
    Minor (reclaiming a frame) page faults: 5603
    Voluntary context switches: 1
    Involuntary context switches: 2338
    Swaps: 0
    File system inputs: 0
    File system outputs: 0
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0

The same (also 1 thread) with bzip2:

$ tar -cf - linux-source-6.0 | /usr/bin/time --verbose bzip3 -j 1 > bz3.1.bz3
    Command being timed: "bzip3 -j 1"
    User time (seconds): 65.07
    System time (seconds): 0.49
    Percent of CPU this job got: 99%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 1:05.84
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 91284
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 1
    Minor (reclaiming a frame) page faults: 1082
    Voluntary context switches: 43183
    Involuntary context switches: 489
    Swaps: 0
    File system inputs: 184
    File system outputs: 0
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0

and it's even smaller:

$ ls -al bz3.1.bz3 xz.1.xz 
-rw-r--r-- 1 calestyo calestyo 131967354 Nov 15 03:04 bz3.1.bz3
-rw-r--r-- 1 calestyo calestyo 133976848 Nov 15 03:02 xz.1.xz

Right now this PR wouldn't work, though, because bzip3 doesn't support (or at least ignore) -# options for the level.

I've filed https://github.com/kspalaiologos/bzip3/issues/78 asking for this. If it was implemented, I guess we shouldn't mention that it's ignored, because it may have an actual effect in some future..

calestyo commented 1 year ago

Oh and I forgot to emphasise:

Look at the times… its considerably faster than xz.

calestyo commented 1 year ago

It looks like bzip3 is not going to get support for the level options.

So we'd need to find some way that btrbk doesn` call it with these. Any ideas? I'm not really a perl person ^^

kspalaiologos commented 1 year ago

Sounds like a simple fix to me.

Guard this line behind an if clause to check if we're not using bzip3: https://github.com/digint/btrbk/blob/9166d73be7416ef04606dc23fcfc7c478ea56897/btrbk#L730

To maybe something like

if($cc->{name} ne "bzip3") { push @cmd, '-' . $level; }
calestyo commented 1 year ago

I guess @digint would want something more generic... perhaps if the min/max levels are both negative or so.

digint commented 1 year ago

uhrg, did not reload fast enough, and pushed a fix for the compression level in b6219017213434f2c84853eb146c433c02d8ec44 ;-) see calestyo-support-bzip3 branch

@calestyo please have a look at it, did not test it yet

calestyo commented 1 year ago

Looks in principle good, but if I were you, I'd swap the commits and make yours more like a new feature, that mine uses then (i.e. not setting the level min/max.

That way, you wouldn't have any state in master, which fails.

calestyo commented 1 year ago

oh, and I've ordered bzip3 before pbzip2, since that seems to be independent of the real bzip2... but fine for me.

digint commented 1 year ago

Looks in principle good, but if I were you, I'd swap the commits and make yours more like a new feature, that mine uses then (i.e. not setting the level min/max.

That way, you wouldn't have any state in master, which fails.

Yes, was going to to it like this (always do ;-). Not much time now, will quickly test / reorder / merge later.

calestyo commented 1 year ago

Awesome. Once you'd have time again (😅)... that rotation of raw backups thingy would be quite interesting for me personally, well actually for my work at the university.

digint commented 1 year ago

merged in af86dc8c52c3cda36f3a7250e8195f5394751d33, 914f9286c77df55d615f7363820b12ec919a6295, with small amendments.

test success:

### btrfs send -p '/tmp/btrbk_unittest/mnt_source/svol.20221116T0041' --proto 2 --compressed-data '/tmp/btrbk_unittest/mnt_source/svol.20221116T0043' | mbuffer -v 1 -q -m 128m -r 1m | bzip3 -c -j4 | ssh -i '/home/axel/.ssh/id_ed25519' -o compression=no root@127.0.0.1 'bzip3 -d -c -j4 | btrfs receive '\''/tmp/btrbk_unittest/mnt_target/'\'''
Command execution successful

(eager to test --proto 2 feature in production ;-)

calestyo commented 1 year ago

thx :-)

daiaji commented 1 year ago

Good benchmark. If it can replace LZMA/LZMA2, the application scenarios of bzip3 will be very extensive.