AOMediaCodec / libavif

libavif - Library for encoding and decoding .avif files
Other
1.53k stars 195 forks source link

Check version numbers of boxes related to the ISO gain map experimental feature #2369

Closed wantehchang closed 1 month ago

wantehchang commented 1 month ago

@maryla-uc @vigneshvg @y-guyon @ccameron-chromium

Please audit our ISO gain map code and make sure we check the version numbers of the boxes related to ISO gain maps.

I took a look at the avifParseToneMappedImageBox() function in src/read.c. We check version and minimum_version but ignore writer_version. Should we check writer_version?

Is there any other box we need to check?

ccameron-chromium commented 1 month ago

writer_version is added for future compatibility, where writer_version>0 with minimum_version=0 would signal new but backwards-compatible features. It is indeed to be ignored.

wantehchang commented 1 month ago

Chris: Thank you for the reply.

If I understand it correctly, this means the current implementation in libavif needs to allow unrecognized fields defined in writer_version>0. So after parsing all the fields defined in writer_version=0, it should allow extra bytes in the box. Therefore we should remove the following check at the end of the avifParseToneMappedImageBox() function:

static avifBool avifParseToneMappedImageBox(avifGainMapMetadata * metadata, const uint8_t * raw, size_t rawLen, avifDiagnostics * diag)
{
    BEGIN_STREAM(s, raw, rawLen, diag, "Box[tmap]");
    ...

    return avifROStreamRemainingBytes(&s) == 0;
}

We should change that final return statement to return AVIF_TRUE;.

Do you agree?

wantehchang commented 1 month ago

Alternatively we can restrict that check to writer_version=0. That may be a better fix:

    return writerVersion != 0 || avifROStreamRemainingBytes(&s) == 0;
wantehchang commented 1 month ago

Another thing to check: If avifParseToneMappedImageBox() sees a tmap box of an unsupported newer version, I think the caller of avifParseToneMappedImageBox() should gracefully ignore the tmap box (as if it were not there) rather than return an error.

ccameron-chromium commented 1 month ago

Another thing to check: If avifParseToneMappedImageBox() sees a tmap box of an unsupported newer version, I think the caller of avifParseToneMappedImageBox() should gracefully ignore the tmap box (as if it were not there) rather than return an error.

Yes, I noticed this behavior, and I agree that it should be ignored and not fatal for the image decode.

maryla-uc commented 1 month ago

Thanks for looking into this Wan-Teh, I agree with your interpretation and proposed changes. I have a couple of questions:

wantehchang commented 1 month ago

Hi Maryla,

Since the version field comes from ISO BMFF itself, if the version of the tmap box is unknown, I think we should ignore the tmap box since tmap is not a required box.

Re: the gainMapPresent field of avifDecoder: This seems like an API design issue. Is it useful to report that an unsupported gain map is detected? If not, then we can update the comment in avif.h to say "This is true when avifDecoderParse() detects a supported gain map."

vigneshvg commented 1 month ago

Re: the gainMapPresent field of avifDecoder: This seems like an API design issue. Is it useful to report that an unsupported gain map is detected? If not, then we can update the comment in avif.h to say "This is true when avifDecoderParse() detects a supported gain map."

I agree with this. I don't think there is any use in reporting that the image has a gainmap if the library cannot do anything with it.

wantehchang commented 1 month ago

Maryla, Vignesh:

Chris sent me a copy of the spec (CD 21496-1). The following paragraph in the spec answers my question in https://github.com/AOMediaCodec/libavif/issues/2369#issuecomment-2271856703:

The metadata structure may have padding data after the end of the structure. The metadata structure may also have future optional metadata after the recognized fields. Parsers shall stop reading after the last recognized metadata field and ignore all remaining data (see the semantics of the minimum_version field in C.2.3).

Because padding data are allowed, I think we should ignore all remaining data at the end of the avifParseToneMappedImageBox() function, regardless of the value of writer_version.

wantehchang commented 1 month ago

Hi Maryla,

I wrote the following answer to your first question:

Since the version field comes from ISO BMFF itself, if the version of the tmap box is unknown, I think we should ignore the tmap box since tmap is not a required box.

Please ignore this answer. I misunderstood "the 'version' tag that is read just before 'minimum_version'".

I didn't see the specification of the version tag in CD 21496-1. Where is it specified?

wantehchang commented 1 month ago

The version tag is defined in ISO/IEC 23008-12:2024/AMD 1:2024(E), Clause 6.6.2.4.2:

aligned(8) class ToneMapImage {
   unsigned int(8) version = 0;
   if(version == 0) {
      GainMapMetadata;
   }
}

It does NOT have padding data after the end of the GainMapMetadata structure. So I think we should only allow remaining data if writer_version != 0, i.e.,

    return writerVersion != 0 || avifROStreamRemainingBytes(&s) == 0;

Clause 6.6.2.4.3 says:

version shall be equal to 0. Readers shall not process a ToneMapImage with an unrecognized version number.

Returning an error would also meet the "shall not process" requirement, but I think it is better to ignore the tmap and display the base image if the version is unknown.

maryla-uc commented 1 month ago

I sent PR #2380 but wasn't completely sure what to do with 'gainMapPresent' with regards to 'enableParsingGainMapMetadata'. Right now, we don't read the gain map metadata if 'enableParsingGainMapMetadata' is false. As explained in the API:

    // Gain map metadata is read during avifDecoderParse(). Like Exif and XMP, this data
    // can be (unfortunately) packed at the end of the file, which will cause
    // avifDecoderParse() to return AVIF_RESULT_WAITING_ON_IO until it finds it.
    // If you don't actually use this data, it's best to leave this to AVIF_FALSE (default).
    avifBool enableParsingGainMapMetadata;

But if we don't read the metadata, we cannnot know whether that metadata is supported. In that PR I kept the current behavior of 'enableParsingGainMapMetadata' so I only parse the gain map metadata if it's enabled. Therfore, for a given file, it's possible to have 'gainMapPresent' true when 'enableParsingGainMapMetadata' is false, but 'gainMapPresent' becomes false if 'enableParsingGainMapMetadata' is set to true and the version of the gain map is unsupported. But I feel like this makes the API quite complicated, so let me know if you have a better suggestion.

wantehchang commented 1 month ago

Maryla: Regarding your question on gainMapPresent with regards to enableParsingGainMapMetadata, I am not familiar with the gain map API in avif.h, so please keep this in mind when you evaluate my comments and suggestions. They may be wrong because I misunderstood the gain map API.

There are two simple solutions.

  1. Treat gainMapPresent in the same way as alphaPresent. Set gainMapPresent to true if a gain map is detected. It doesn't matter whether the gain map metadata can be parsed or the gain map image can be decoded successfully.
  2. Treat enableParsingGainMapMetadata (and perhaps also enableDecodingGainMap; see Note below) in the same way as ignoreExif and ignoreXMP. Set gainMapPresent to false if enableParsingGainMapMetadata is false.

Note: It is not clear why the parsing of gain map metadata and the decoding of gain map image need to be controlled separately. This is complicated and it is not clear whether all four combinations of these two boolean options are allowed. For example, is the combination enableParsingGainMapMetadata=false and enableDecodingGainMap=true allowed? And is it useful to support the combination enableParsingGainMapMetadata=true and enableDecodingGainMap=false?

maryla-uc commented 1 month ago

Thanks for your input Wan-Teh, you raise very good points.

The reason that there are two booleans to control the behavior is because enableParsingGainMapMetadata affects the parsing phase (avifDecoderParse()) while enableDecodingGainMap affects the decoding phase (avifDecoderNextImage()).

Currently all 4 combinations are indeed allowed (and are tested in avifgainmaptest.cc)

enableParsingGainMapMetadata = false
enableDecodingGainMap = false

Default value, useful for decoders that don't care about gain maps at all. In this case admittedly they probably don't care about gainMapPresent either. In case of incremental decoding, having enableParsingGainMapMetadata=false to disable parsing the gainmap metadata means the user doesn't have to wait for the mdat payload of the tmap item. This is similar to how we encourage users to set ignoreExif or ignoreXMP to true if they don't use this data.

enableParsingGainMapMetadata = true
enableDecodingGainMap = false

This is used in Chrome right now. Useful for decoders that want to extract the gain map metadata but not the gain map image itself at that time. (Chrome decodes the gain map later).

enableParsingGainMapMetadata = false
enableDecodingGainMap = true

A user that previously used the enableParsingGainMapMetadata = true enableParsingGainMap = false could use this to decode the gain map image if they have previously decoded the metadata, but enableParsingGainMapMetadata = true enableDecodingGainMap = true would work too.

enableParsingGainMapMetadata = true
enableDecodingGainMap = true

Parse metadata and decode gain map all at once. Most users that care about gain maps will probably use this.

Starting from this PR (that is now merged) the behavior regarding gain map version is as follows:

I feel like this gives a lot of control but it's also complicated/overkill.

I agree with Vignesh that "there is [no] use in reporting that the image has a gainmap if the library cannot do anything with it" so I don't think your solution 1 is the best. I think solution 2 makes sense, i.e. gainMapPresent is not populated if enableParsingGainMapMetadata is false. It seems a lot easier to understand API wise, and I think it covers all sensible use cases. We could also disallow the combination enableParsingGainMapMetadata = false enableDecodingGainMap = true to remove the edge case where we decode the gain map image even though its metadata may not be supported?

wantehchang commented 1 month ago

Maryla: Thank you very much for working out the details of all four combinations. I understand the two options now.

Since the two options look progressive (i.e., the second option seems to require the first option), the third combination (first option=false, second option=true) is surprising. Does the API support providintg gain map metadata as input to avifDecoder?

It would make the two options easier to understand if they are progressive. So I support disallowing the third combination.

maryla-uc commented 1 month ago

Does the API support providintg gain map metadata as input to avifDecoder?

No because the gain map metadata is not needed nor used by the decoder. It's just provided to the user as is. The user can then use the gain map metadata in combination with the gain map image pixels (and the base image) to do tonemapping, for example by using the avifImageApplyGainMap() function provided at the bottom of avif.h, or forwarding them to Skia (what Chrome does), etc.

In most cases it would make sense to get both metadata and pixels at the same time by setting both enableParsingGainMapMetadata and enableDecodingGainMap to true, but in the case of Chrome we needed to get them separately.

wantehchang commented 1 month ago

Thanks for the clarification. I didn't know that the gain map metadata is not needed nor used by the decoder. That's the subtlety I was missing.

wantehchang commented 1 month ago

Maryla: Re: gainMapPresent

When I wrote my previous comments on this issue, I didn't know about the requirement of having the 'tmap' brand in the compatible_brands array of the FileTypeBox. After seeing your commit https://github.com/AOMediaCodec/libavif/commit/ab708ca88689210b770005198c88f0c2e4159a6a, I think there is an even simpler definition of the gainMapPresent field: gainMapPresent is true iff the 'tmap' brand is in the compatible_brands array of the FileTypeBox.

This simple definition of gainMapPresent is based on the following requirements in the spec (ISO/IEC 23008-12:2024/AMD 1:2024):

10.2.6 'tmap' brand

6.8.10.1 Definition

This brand enables file players to identify and decode HEIF files containing tone-map derived image items. When present, this brand shall be among the brands included in the compatible_brands array of the FileTypeBox.

6.8.10.2 Requirements on files

A file containing the 'tmap' brand in the compatible_brands array of the FileTypeBox shall contain one or more tone-map derived image items.

Differences from the current definition:

So we need to evaluate whether this simple definition, while easy to state and implement, is useful to our users.

maryla-uc commented 1 month ago

Thanks for the suggestion. Indeed this would be a simple/easy interpretation, but I think it's less useful to users. In that case we might want to have two different presence fields, making the API more complicated...

wantehchang commented 1 month ago

I just realized that I may have made a mistake in my analysis. I said the simple definition of gainMapPresent based on the presence of the 'tmap' brand will allow us to remove the enableParsingGainMapMetadata option. I am no longer sure if that is correct.

Also, it seems that decoder->image->gainMap is non-null if and only if decoder->gainMapPresent is true. If so, then we don't need decoder->gainMapPresent.

I'd like to study this issue again when Chrome sets the enableDecodingGainMap option to true. At that time we should be able to remove the enableDecodingGainMap option.