Open ThomasWaldmann opened 6 years ago
borg uses lzma as offered by Python's lzma standard library module IF you use -C lzma
.
https://docs.python.org/3/library/lzma.html
https://github.com/borgbackup/borg/blob/1.1.5/src/borg/compress.pyx#L190 (note: FORMAT_XZ (default) and CHECK_NONE)
All comments in the article about bad/missing error (or tampering) detection by xz/lzma are somehow irrelevant for borg because borg usually first authenticates data, then decrypts, then decompresses. Any random or abusive data modification would be detected and authentication would fail and it would not even try to decrypt or decompress in that case. This is the reason why we give CHECK_NONE to lzma.
Only exception is when you use borg without authentication, but even then we still have the content hash and a crc32 on the storage layer.
(some of the points raised in that article, e.g. about error checking, seem to be interesting in case we ever redesign the borg repo data structures, thus I am labelling this for repository and breaking)
It looks like FORMAT_RAW + CHECK_NONE would be adquate for borg. Not sure about the filters / other params.
Considering we already have --compression=lzma
for FORMAT_XZ, guess we would call that --compression=lzma1
.
Needs some experimenting whether the saved overhead is worth it.
We need to keep the --compression=lzma
so that already compressed data still works (and does not need to get recompressed).
@elho any opinion about this?
Uh, I vaguely remember at some point in time having read the article outlining the inadequacies of xz without any relation to borg, taking away that lzma is the better choice and staying away from xz
on the command line.
Python's lzma
Moule with the synopsis "Compression using the LZMA algorithm" in reality wrapping data into xz containers by default stilll comes as a bad surprise.
It looks like FORMAT_RAW + CHECK_NONE would be adquate for borg. Not sure about the filters / other params.
Yes, the former seems to be the way to go. The parameters allow for a lot of fiddling, given the upper bound of chunk sizes, there may even be room for optimization.
Either some experimental code in a separate branch to play with those or even something along the lines of pngcush -brute
to run over some small (enough to be feasible) repos to get an idea of what could work.
As for the name to use with the --compression
parameter, I would probably prefer rlzma
or lzmar
to reflect the RAW format prart and not end up with users confusing the old lzma
for LZMA2 given the other is called lzma1
.
I guess a borg 3.0 at latest could treat lzma
as the new variant again. :slightly_smiling_face:
Above approach with a "new" compressor allows recreate --recompress
to work without any modification to upgrade a repository from XZ to RAW format LZMA. That however also means fully de- and then recompressing each chunk.
We need to keep the
--compression=lzma
so that already compressed data still works (and does not need to get recompressed).
With above approach yes, ~but in general no, not necessarily~.
Edit: Scratch the latter, even though lzma.decompress
defaults to FORMAT_AUTO
, the latter unfortunately can not handle the RAW format according to the documentation. So the hope for existing borg 1.x being able to handle a simple switch in format in the same lzma
compressor dies at that point. :slightly_frowning_face:
Still, even with needing two different borgcompressors for the formats, a recreate --recompress
that switches any XZ LZMA data to RAW without touching the actual compression, but just unwrapping the XZ container around the RAW data would be thinkable.
Note: Cheaterman on IRC noted that lzma -1
(== lzma, level 1) is quite nice.
The reason I noted that BTW: https://catchchallenger.first-world.info/wiki/Quick_Benchmark:_Gzip_vs_Bzip2_vs_LZMA_vs_XZ_vs_LZ4_vs_LZO
Feel free to make your own conclusions of course! Mine was simply that lzma -1 seems to take impressively little time compared to the pretty nice compression ratio it yields. :-)
Note: in case we want to change anything about how we deal with lzma compression, it has to be finished by 2.0rc1 at the latest.
borg2 is breaking repo compatibility, so archives need to be transferred from old repos to new repos. that's the only opportunity when we could change compression-related details.
otherwise: just close this at 2.0rc1.
http://www.nongnu.org/lzip/xz_inadequate.html