Option to disable compression of chunks

gilbertchen / duplicacy

A new generation cloud backup tool

https://duplicacy.com

Other

5.11k stars 335 forks source link

Option to disable compression of chunks #153

Open robbat2 opened 7 years ago

robbat2 commented 7 years ago

One of the sets of content I need to back up is already maximally compressed with XZ, and it makes no sense to try further compressing the chunks with LZ4.

The snapshot data should record the compression format (if any) of the chunks, and permit compression to be entirely optional. This also provides future-proofing for the next great compression breakthrough, and re-compressing existing backups.

niknah commented 7 years ago

In duplicacy_chunk.go it looks like the preferences file takes a "compression-level" value. sort of goes like... -1 default zlib compression 0 no compression 9 best zlib 100 LZ4

But I haven't tried it.

Also in the docs it says that the default compression level is -1, but might be 100(LZ4)

gilbertchen commented 7 years ago

Prior to version 1.2 you can set the compression level (using the standard zlib numbers 0-9 or -1) when initializing the storage. However, after version 1.2 I decided to switch to LZ4 for compression and blake2 for hash (instead of SHA256), mostly for performance. Therefore, a somewhat arbitrary level of 100 is used to indicate the use of both LZ4 and blake2. And I natively believed that LZ4 is so much faster that there would be no need for other options therefore the compression level option was removed.

Obviously I was wrong and the compression level option should be added back to the init command. The good news is, it is super easy to introduce new compression algorithms (for instance it was just a few lines of code to support LZ4).

Please suggest the compression algorithms that you think should be supported (besides the no compression option).

robbat2 commented 7 years ago

LZ4 is fine for the cases that I do want compression, but I can see people that might want something like Snappy for bounded time in compression.

niknah commented 7 years ago

I'd like a high compression one. Like lzma, xz. When we back up to these cloud services, they will charge us $ per month to keep things there. This will add up to a lot after a few years. And some cloud storage services also have high costs for downloading your data.

Thanks

fenixnet-net commented 6 years ago

+1 for control of compression. I'm using a raspberry pi and an external drive at an offsite location with particularly fast upload to seed my home backups, as pushing them through my home connection from scratch would take about 18 months. A lot of it is already compressed or encoded in one way or another, and I'm backing up to Backblaze B2, which is ultra-cheap. I'd rather have the time/throughput performance than save a few megs here and there with compression.

sergeevabc commented 6 years ago

To dev: zstd --long seems to work better in terms of speed/compression, a possible golden mean?

To cloud junkies: free plan usually covers the preservation of essential bits, the rest is a reluctance to sort.

Ralith commented 6 years ago

zstd support would be great to have.

Ralith commented 5 years ago

@gilbertchen any updates on the plans here?

sedlund commented 5 years ago

any modern compression algo is smart enough to switch to store only any non-compressable stream automatically.

lz4 used in this project does it here:

https://github.com/bkaradzic/go-lz4/blob/7224d8d8f27ef618c0a95f1ae69dbb0488abc33a/writer.go#L138

there is no golang native port of zstd - seems like a poor idea to link the C library

morris-t commented 4 years ago

It seems there is now a native go implementation of zstd: https://github.com/klauspost/compress/tree/master/zstd