Open ThomasWaldmann opened 9 months ago
It's a good possibility to improve compression in some cases, but:
I didn't realize it was file data -> chunk -> compress
, that rules out any simple implementation. JXL is not a compression algorithm like lz4 that takes any bytes you throw at it. If you're starting with a JPEG it needs a complete file with a header and all the pixels.
I could probably write a separate "chunker" not just for JPEG, but all image formats supported by Pillow. Split the image into raw tiles (chunks) of the size you need and then compress each chunk as a separate, lossless JXL image. There's a Pillow JXL plugin with lossless support. Additionally to achieve a bit-identical reversal of the entire process, the original image header (EXIF metadata, etc) will need to be stored in a separate chunk and reconstituted.
Seems like it's not worth it?
This discussion remembers me some of the arguments detailed here: https://www.nongnu.org/lzip/xz_inadequate.html (I read it years ago, don't remembers the details, but the point was to try to have simple formats for archiving data minimizing issues.)
If we don't come up with a good/easy solution, an alternate way to use jpeg xl is of course that the users convert their photos to that format at the primary storage location.
If there is an easy transformation back to the original format, that seems the better idea anyway because then it also uses less storage at the primary location. Only issue could be that the tools preferred by the users do not (yet) read/display that format.
an alternate way to use jpeg xl is of course that the users convert their photos to that format at the primary storage location.
That's what I do, I use the official CLI tools to encode/decode as needed before/after running borg.
Only issue could be that the tools preferred by the users do not (yet) read/display that format.
This is the real problem. Adoption has stalled, currently to browse thumbnails and open the images you pretty much need to be on Linux and you need to compile something like gThumb yourself. 0.000001% of users will do this and it looks like that won't change.
So I was hoping JXL can at least have a future as an archive format used internally by tools like borg. In my case it already saves me 50+GB of space and bandwidth, would be very useful to make that available to everyone.
Let's discuss here, whether / how borg could support this, assuming there is a jpeg xl library (with python / cython binding), that supports a bit-identical compression (transformation to jpeg xl format) and decompression (transformation back to the original file).
Notable:
file data -> chunk -> compress -> encrypt/auth -> store