TheHardew / compress_comics

A script to compress cbz/cbr comics with jpeg xl
The Unlicense
7 stars 0 forks source link

Motivation for compression parameters selection? #20

Closed JinEnMok closed 7 months ago

JinEnMok commented 7 months ago

One thing I noticed about this script is just how insanely high the quality/effort settings are. It's not about -d 1, since that one (arguably) is the best option for the already-lossy comics.

Does it actually make sense to use --effort=9 here? There's been something of a comparison between effort settings on Reddit, which found out that after -e 7 the encode time scales by x2-4 for each level, while the output files' size stays within 100kB of -e 7. Considering we may have huge collections of books with hundreds of pages, is it worth it to waste that much time for so little gain?

Same goes for brotli_effort - what was your motivation for setting that one to max as well? As far as I could find, it's not even documented yet, same as modular_nb_prev_channels.

Granted, none of this matters for JPEG source files (which must be the majority for comics, I guess), since --lossless_jpeg 1 doesn't transcode them at all: it just repacks them, saving about 20% of the original size for -e 7, and around few percent extra for -e 9. E.g. on one test subject I shrank a CBR from the original 209MB to 163MB and to 156MB, with effort at 7 and 9, respectively, while the latter setting took whole lotta more time.


My point is - why did you choose the defaults that you chose? If you did some sort of benchmarking, it'd be interesting to see.

TheHardew commented 7 months ago

Really the only reason is that it was fast enough for me and I wanted to compress more. I guess I can make the defaults more sane.

-E/--modular_nb_prev_channels used to be popular on reddit, but strangely enough cjxl -v -v -v -h does not mention it now 0.9.1

I don't think I will support any configuration files, you can make a shell alias.

JinEnMok commented 7 months ago

I think the only reason it's fast enough for you is due to --jpeg_lossless=1, which means there's actually no re-encoding done on top of JPEG, only re-packing. Even my ancient 2-core Phenom II X2 executes those in a few seconds per page max. Once you set -j 0 the sky's the limit for how long it'll take.

-E/--modular_nb_prev_channels is shown with cjxl -h -v -v -v -v under "Modular mode options". Seems like leaving it up to the encoder itself would be best.

JinEnMok commented 7 months ago

I actually want to try and benchmark a few settings with different comics of different styles and then do a comparison of size/encoding speed/relative quality for lossy encoding. I've tried a few, using the SSIMULACRA2 tool bundled with libjxl. So far d 0.5 looks promising, yielding a score close to 90 ("visually indistinguishable at 1:1") and around 2/3 or half the size of the original.

TheHardew commented 7 months ago

I do also have a bunch of manga encoded with the modular mode, source being png. For me it still was fast enough.

I wasn't aware they made it so now -v has to be passed 4 times

I'll make the default -d 0 -j 1, I definitely don't want it to be destructive by default, even if it outputs to a separate directory.

TheHardew commented 7 months ago

I guess for the rest of the flags I'll not talk about the default in the help section, since it could change with the encoder.

TheHardew commented 7 months ago
@JinEnMok if you are interested in benchmarking, for best performance you should really do what my program does, compress one image per thread with --num_threads 1. I ran one such test: Settings Elapsed User CPU System CPU CPU Used s/img MPx/s Size
-e7 -t1 0:54.25 65.56 13.30 145% 0.85 1.77 16.8 MiB
-e7 -t16 0:08.25 93.04 9.13 1238% 0.13 11.64 16.8 MiB
-e9 -t1 13:43.95 883.22 14.94 109% 12.87 0.12 13.7 MiB
-e9 -t16 1:27.89 1190.10 10.09 1365% 1.37 1.09 13.7 MiB

Data collected with bsd time. I used -d0 for the test.

The test file consisted of 64 png images, 1000x1500: test_images.zip

If you intend to run it with my program, rename the extension to .cbz.

I have a ryzen 2700x, with 16 threads. --num_threads are calculated as number of threads / --threads, so with -t1 cjxl uses it's own multithreading. But it's so much slower than doing it per file.

TheHardew commented 7 months ago

@JinEnMok On one manga now -e7 and -e9 --brotli_effort 11 -E3 was a difference between 211 MiB and 179 MiB. Original size 233 MiB. Modular encoding. -e9 saved more from -e7 than -e7 saved from the original. It does make a difference.