FLIF-hub / FLIF

Free Lossless Image Format
Other
3.72k stars 229 forks source link

Effort option is non-linear #379

Open hrj opened 7 years ago

hrj commented 7 years ago

Sometimes -E100 creates bigger FLIFs than the default. I suspect that is because -E100 enables -R4 v/s the default -R2, and as we know, the -R option gives unpredictable results.

Which optimisations are enabled by -E? And can we restrict them to predictable, linear ones?

jonsneyers commented 7 years ago

The relevant code is here: https://github.com/FLIF-hub/FLIF/blob/master/src/flif.cpp#L642

None of the encode options are really fully predictable. The only thing that is more or less predictable is how much time it will take, but the effect on compression is not really predictable, at least not for individual images. Aggregated over many images, the extra effort things will give slightly smaller files, but for an individual image you never really know. The only way to really know is to do multiple encodes.

In that respect it's the same as with PNG, where brute-force PNG optimizers have no other choice than to just try different options and see which works best.

By the way, for most use cases, -E30 is 'close enough' to optimal even though it's just -R1.

hrj commented 7 years ago

Thanks for the detailed answer!

I was hoping that some kind of backtracking could be done in the encoder, so that the entire encoding process wouldn't have to be retried, when crushing FLIFs.

Are there any features which could be backtracked? For example, could Auto-Color-Buckets be checked to see if it would reduce file size, and if so "commit" to it. Otherwise, backtrack to disable it. Or are all the features chaotic, and the only way to find out is to try out the full encoding process?

jonsneyers commented 7 years ago

The proof of the pudding is in the, uh, encoding. So the only way to really see if an option or setting is good or not, is to do the full encoding. Furthermore there are interactions between the different transformations so finding the best settings is an optimization problem with a very large search space and an expensive objective function.

Some of the transformations do some kind of heuristics to try to figure out if they are going to be useful or not and what settings to use. For example FrameLookback investigates up to N previous frames to find pixels to refer to (if you use -LN, but the actual upper bound on the lookback is set by a heuristic and can be lower than N, based on how many pixels can be reused from older frames and what the expected cost of a pixel is (if pixels are 24-bit RGB values they are assumed to be more expensive than if they are values from a palette of 5 colors). Or a more trivial case: Bounds and ChannelCompact are not actually done if they would not actually reduce the range of values.

Or for example the pixel estimator is chosen heuristically by finding the estimator that results in the lowest differences (so which is closest to the actual image). But it's not necessarily the best estimator that results in the best compression: what counts are actual entropy coded bits, and those are not the differences between predictor and actual values but basically how accurately the MANIAC probability model manages to model those differences based on local context. Which might be a completely different thing, though usually not really.

hrj commented 7 years ago

But it's not necessarily the best estimator that results in the best compression: what counts are actual entropy coded bits, and those are not the differences between predictor and actual values but basically how accurately the MANIAC probability model manages to model those differences based on local context.

Would it make sense to store the prediction results in a buffer and do the entropy coding at the end. That way, different settings can be tried faster, as the prediction doesn't have to be re-done again.

It would require more memory though.

jonsneyers commented 7 years ago

If I recall correctly, the heuristic to decide the predictor doesn't compute the full thing, it just looks at zoomlevel 1 (so about 25% of the pixels). Also, computing the predictor is not that expensive compared to computing all local context properties for the MANIAC encoding.