Feature request: separate out encoding of chroma & alpha

wadetregaskis commented 1 month ago

From looking at the code, it appears AVIFs with transparency are essentially just two AV1 images, each encoded independently - one for chroma and one for alpha. libavif currently forces [re]encoding of both of those, even if one has already been encoded previously and has not changed. This can be very wasteful in some use cases.

Consider for example a GUI over libavif which allows the user to independently control the quality setting for chroma vs alpha, via e.g. two sliders. It's unfortunate that currently a bunch of CPU time has to be wasted redundantly re-encoding e.g. the alpha channel when only the chroma quality slider is changed.

I'm not sure what the best API change would be, to support this. Presumably some way for me to provide the encoder with the earlier encoder (or final output data) that it can use if applicable. Or some way to reuse an encoder, any number of times, with settings tweaked between uses? (as far as I can tell, changes to encoder configuration only apply to future avifEncoderAddImage calls at best, and avifEncoderFinish is only supposed to be called once?)

y-guyon commented 1 month ago

Thank you for your interest in libavif.

From looking at the code, it appears AVIFs with transparency are essentially just two AV1 images, each encoded independently - one for chroma and one for alpha.

This is correct.

libavif currently forces [re]encoding of both of those, even if one has already been encoded previously and has not changed. This can be very wasteful in some use cases.

I expect the number of use cases where only a subset of the channels need to be reencoded to be fairly small.

Consider for example a GUI over libavif which allows the user to independently control the quality setting for chroma vs alpha, via e.g. two sliders. It's unfortunate that currently a bunch of CPU time has to be wasted redundantly re-encoding e.g. the alpha channel when only the chroma quality slider is changed.

I agree.

I'm not sure what the best API change would be, to support this. Presumably some way for me to provide the encoder with the earlier encoder (or final output data) that it can use if applicable. Or some way to reuse an encoder, any number of times, with settings tweaked between uses? (as far as I can tell, changes to encoder configuration only apply to future avifEncoderAddImage calls at best, and avifEncoderFinish is only supposed to be called once?)

What about:

encoding the full image with the alpha layer the first time, recording the whole file size and the size of each internal AV1 image item (thanks to avifIOStats),
the next times, only encode either the color channels as an opaque image, or the alpha channel as an opaque monochrome image. The final whole file size can be found with the lengths stored in the first pass, and the layers of the multiple images can be composited into a single translucent image before rendering the GUI.

One would have to be careful with alpha-multiplied samples though.

Alternatively there may be libraries that can work with ISOBMFF-style container boxes such as mp4box but I doubt introducing another dependency is the point here.

wadetregaskis commented 1 month ago

What about:

encoding the full image with the alpha layer the first time, recording the whole file size and the size of each internal AV1 image item (thanks to avifIOStats),

the next times, only encode either the color channels as an opaque image, or the alpha channel as an opaque monochrome image. The final whole file size can be found with the lengths stored in the first pass, and the layers of the multiple images can be composited into a single translucent image before rendering the GUI.

Yeah, that's not too difficult for me to do, I think. I already keep careful track of the file components' sizes.

Would there be a way to produce the final file without having to re-encode everything, though? I don't currently see an API for explicitly providing existing, compressed image layers.

If not, then (aside from the inefficiency of redundant re-encodes) for my purposes I'm not sure it'd be tenable, as once the image is displayed on screen the user can e.g. drag-and-drop it into another application. Since encodes of non-trivial images take a long time, and even just a second of delay is unacceptable, the fully-composed, final file data needs to be basically ready to go at screen render time.

y-guyon commented 1 month ago

Would there be a way to produce the final file without having to re-encode everything, though? I don't currently see an API for explicitly providing existing, compressed image layers.

This is not possible with the libavif API as of today.

If your files always use the same pattern, you could look at reconstructing them yourselves. The HEIF container format is rather complex but if there is always a single alpha auxiliary image item attached to a single primary color image item with no other item or non-essential property, you would just have to retrieve the AV1 payloads for each of these image items, replace them in the final file, and update these fields: 'iloc' sizes and offsets, 'ispe' width and height, 'mdat' box size (which could be 0 for simplicity, meaning "till end of file").

y-guyon commented 1 month ago

Prototype: https://github.com/AOMediaCodec/libavif/pull/2381

AOMediaCodec / libavif

Feature request: separate out encoding of chroma & alpha #2374