Closed wantehchang closed 1 month ago
writer_version
is added for future compatibility, where writer_version>0
with minimum_version=0
would signal new but backwards-compatible features. It is indeed to be ignored.
Chris: Thank you for the reply.
If I understand it correctly, this means the current implementation in libavif needs to allow unrecognized fields defined in writer_version>0
. So after parsing all the fields defined in writer_version=0
, it should allow extra bytes in the box. Therefore we should remove the following check at the end of the avifParseToneMappedImageBox()
function:
static avifBool avifParseToneMappedImageBox(avifGainMapMetadata * metadata, const uint8_t * raw, size_t rawLen, avifDiagnostics * diag)
{
BEGIN_STREAM(s, raw, rawLen, diag, "Box[tmap]");
...
return avifROStreamRemainingBytes(&s) == 0;
}
We should change that final return statement to return AVIF_TRUE;
.
Do you agree?
Alternatively we can restrict that check to writer_version=0
. That may be a better fix:
return writerVersion != 0 || avifROStreamRemainingBytes(&s) == 0;
Another thing to check: If avifParseToneMappedImageBox()
sees a tmap box of an unsupported newer version, I think the caller of avifParseToneMappedImageBox()
should gracefully ignore the tmap box (as if it were not there) rather than return an error.
Another thing to check: If
avifParseToneMappedImageBox()
sees a tmap box of an unsupported newer version, I think the caller ofavifParseToneMappedImageBox()
should gracefully ignore the tmap box (as if it were not there) rather than return an error.
Yes, I noticed this behavior, and I agree that it should be ignored and not fatal for the image decode.
Thanks for looking into this Wan-Teh, I agree with your interpretation and proposed changes. I have a couple of questions:
Hi Maryla,
Since the version field comes from ISO BMFF itself, if the version of the tmap box is unknown, I think we should ignore the tmap box since tmap is not a required box.
Re: the gainMapPresent
field of avifDecoder
: This seems like an API design issue. Is it useful to report that an unsupported gain map is detected? If not, then we can update the comment in avif.h to say "This is true when avifDecoderParse() detects a supported gain map."
Re: the
gainMapPresent
field ofavifDecoder
: This seems like an API design issue. Is it useful to report that an unsupported gain map is detected? If not, then we can update the comment in avif.h to say "This is true when avifDecoderParse() detects a supported gain map."
I agree with this. I don't think there is any use in reporting that the image has a gainmap if the library cannot do anything with it.
Maryla, Vignesh:
Chris sent me a copy of the spec (CD 21496-1). The following paragraph in the spec answers my question in https://github.com/AOMediaCodec/libavif/issues/2369#issuecomment-2271856703:
The metadata structure may have padding data after the end of the structure. The metadata structure may also have future optional metadata after the recognized fields. Parsers shall stop reading after the last recognized metadata field and ignore all remaining data (see the semantics of the
minimum_version
field in C.2.3).
Because padding data are allowed, I think we should ignore all remaining data at the end of the avifParseToneMappedImageBox()
function, regardless of the value of writer_version
.
Hi Maryla,
I wrote the following answer to your first question:
Since the version field comes from ISO BMFF itself, if the version of the tmap box is unknown, I think we should ignore the tmap box since tmap is not a required box.
Please ignore this answer. I misunderstood "the 'version' tag that is read just before 'minimum_version'".
I didn't see the specification of the version tag in CD 21496-1. Where is it specified?
The version tag is defined in ISO/IEC 23008-12:2024/AMD 1:2024(E), Clause 6.6.2.4.2:
aligned(8) class ToneMapImage {
unsigned int(8) version = 0;
if(version == 0) {
GainMapMetadata;
}
}
It does NOT have padding data after the end of the GainMapMetadata
structure. So I think we should only allow remaining data if writer_version != 0
, i.e.,
return writerVersion != 0 || avifROStreamRemainingBytes(&s) == 0;
Clause 6.6.2.4.3 says:
version
shall be equal to 0. Readers shall not process aToneMapImage
with an unrecognized version number.
Returning an error would also meet the "shall not process" requirement, but I think it is better to ignore the tmap and display the base image if the version is unknown.
I sent PR #2380 but wasn't completely sure what to do with 'gainMapPresent' with regards to 'enableParsingGainMapMetadata'. Right now, we don't read the gain map metadata if 'enableParsingGainMapMetadata' is false. As explained in the API:
// Gain map metadata is read during avifDecoderParse(). Like Exif and XMP, this data
// can be (unfortunately) packed at the end of the file, which will cause
// avifDecoderParse() to return AVIF_RESULT_WAITING_ON_IO until it finds it.
// If you don't actually use this data, it's best to leave this to AVIF_FALSE (default).
avifBool enableParsingGainMapMetadata;
But if we don't read the metadata, we cannnot know whether that metadata is supported. In that PR I kept the current behavior of 'enableParsingGainMapMetadata' so I only parse the gain map metadata if it's enabled. Therfore, for a given file, it's possible to have 'gainMapPresent' true when 'enableParsingGainMapMetadata' is false, but 'gainMapPresent' becomes false if 'enableParsingGainMapMetadata' is set to true and the version of the gain map is unsupported. But I feel like this makes the API quite complicated, so let me know if you have a better suggestion.
Maryla: Regarding your question on gainMapPresent
with regards to enableParsingGainMapMetadata
, I am not familiar with the gain map API in avif.h, so please keep this in mind when you evaluate my comments and suggestions. They may be wrong because I misunderstood the gain map API.
There are two simple solutions.
gainMapPresent
in the same way as alphaPresent
. Set gainMapPresent
to true if a gain map is detected. It doesn't matter whether the gain map metadata can be parsed or the gain map image can be decoded successfully.enableParsingGainMapMetadata
(and perhaps also enableDecodingGainMap
; see Note below) in the same way as ignoreExif
and ignoreXMP
. Set gainMapPresent
to false if enableParsingGainMapMetadata
is false.Note: It is not clear why the parsing of gain map metadata and the decoding of gain map image need to be controlled separately. This is complicated and it is not clear whether all four combinations of these two boolean options are allowed. For example, is the combination enableParsingGainMapMetadata=false
and enableDecodingGainMap=true
allowed? And is it useful to support the combination enableParsingGainMapMetadata=true
and enableDecodingGainMap=false
?
Thanks for your input Wan-Teh, you raise very good points.
The reason that there are two booleans to control the behavior is because enableParsingGainMapMetadata
affects the parsing phase (avifDecoderParse()
) while enableDecodingGainMap
affects the decoding phase (avifDecoderNextImage()
).
Currently all 4 combinations are indeed allowed (and are tested in avifgainmaptest.cc)
enableParsingGainMapMetadata = false
enableDecodingGainMap = false
Default value, useful for decoders that don't care about gain maps at all. In this case admittedly they probably don't care about gainMapPresent
either. In case of incremental decoding, having enableParsingGainMapMetadata=false
to disable parsing the gainmap metadata means the user doesn't have to wait for the mdat
payload of the tmap
item. This is similar to how we encourage users to set ignoreExif
or ignoreXMP
to true if they don't use this data.
enableParsingGainMapMetadata = true
enableDecodingGainMap = false
This is used in Chrome right now. Useful for decoders that want to extract the gain map metadata but not the gain map image itself at that time. (Chrome decodes the gain map later).
enableParsingGainMapMetadata = false
enableDecodingGainMap = true
A user that previously used the enableParsingGainMapMetadata = true enableParsingGainMap = false
could use this to decode the gain map image if they have previously decoded the metadata, but enableParsingGainMapMetadata = true enableDecodingGainMap = true
would work too.
enableParsingGainMapMetadata = true
enableDecodingGainMap = true
Parse metadata and decode gain map all at once. Most users that care about gain maps will probably use this.
Starting from this PR (that is now merged) the behavior regarding gain map version is as follows:
enableParsingGainMapMetadata
is false, the gain map metadata version is not known, so the version is ignored. gainMapPresent
is set to true if a tmap
item is found. If enableDecodingGainMap
is true, then the gain map is decoded.enableParsingGainMapMetadata
is true, the gain map metadata version is known. If the gain map version is unsupported, then gainMapPresent
will be set to false and the gain map image will not be decoded even if enableDecodingGainMap
is true.I feel like this gives a lot of control but it's also complicated/overkill.
I agree with Vignesh that "there is [no] use in reporting that the image has a gainmap if the library cannot do anything with it" so I don't think your solution 1 is the best.
I think solution 2 makes sense, i.e. gainMapPresent
is not populated if enableParsingGainMapMetadata
is false. It seems a lot easier to understand API wise, and I think it covers all sensible use cases.
We could also disallow the combination enableParsingGainMapMetadata = false enableDecodingGainMap = true
to remove the edge case where we decode the gain map image even though its metadata may not be supported?
Maryla: Thank you very much for working out the details of all four combinations. I understand the two options now.
Since the two options look progressive (i.e., the second option seems to require the first option), the third combination (first option=false, second option=true) is surprising. Does the API support providintg gain map metadata as input to avifDecoder?
It would make the two options easier to understand if they are progressive. So I support disallowing the third combination.
Does the API support providintg gain map metadata as input to avifDecoder?
No because the gain map metadata is not needed nor used by the decoder. It's just provided to the user as is. The user can then use the gain map metadata in combination with the gain map image pixels (and the base image) to do tonemapping, for example by using the avifImageApplyGainMap()
function provided at the bottom of avif.h, or forwarding them to Skia (what Chrome does), etc.
In most cases it would make sense to get both metadata and pixels at the same time by setting both enableParsingGainMapMetadata
and enableDecodingGainMap
to true, but in the case of Chrome we needed to get them separately.
Thanks for the clarification. I didn't know that the gain map metadata is not needed nor used by the decoder. That's the subtlety I was missing.
Maryla: Re: gainMapPresent
When I wrote my previous comments on this issue, I didn't know about the requirement of having the 'tmap'
brand in the compatible_brands
array of the FileTypeBox
. After seeing your commit https://github.com/AOMediaCodec/libavif/commit/ab708ca88689210b770005198c88f0c2e4159a6a, I think there is an even simpler definition of the gainMapPresent
field: gainMapPresent
is true iff the 'tmap'
brand is in the compatible_brands
array of the FileTypeBox
.
This simple definition of gainMapPresent
is based on the following requirements in the spec (ISO/IEC 23008-12:2024/AMD 1:2024):
10.2.6 'tmap' brand
6.8.10.1 Definition
This brand enables file players to identify and decode HEIF files containing tone-map derived image items. When present, this brand shall be among the brands included in the
compatible_brands
array of theFileTypeBox
.6.8.10.2 Requirements on files
A file containing the
'tmap'
brand in thecompatible_brands
array of theFileTypeBox
shall contain one or more tone-map derived image items.
Differences from the current definition:
enableParsingGainMapMetadata
, because it only needs to parse the FileTypeBox
. It will allow us to remove the enableParsingGainMapMetadata
option.So we need to evaluate whether this simple definition, while easy to state and implement, is useful to our users.
Thanks for the suggestion. Indeed this would be a simple/easy interpretation, but I think it's less useful to users. In that case we might want to have two different presence fields, making the API more complicated...
I just realized that I may have made a mistake in my analysis. I said the simple definition of gainMapPresent
based on the presence of the 'tmap'
brand will allow us to remove the enableParsingGainMapMetadata
option. I am no longer sure if that is correct.
Also, it seems that decoder->image->gainMap
is non-null if and only if decoder->gainMapPresent
is true. If so, then we don't need decoder->gainMapPresent
.
I'd like to study this issue again when Chrome sets the enableDecodingGainMap
option to true. At that time we should be able to remove the enableDecodingGainMap
option.
@maryla-uc @vigneshvg @y-guyon @ccameron-chromium
Please audit our ISO gain map code and make sure we check the version numbers of the boxes related to ISO gain maps.
I took a look at the
avifParseToneMappedImageBox()
function in src/read.c. We checkversion
andminimum_version
but ignorewriter_version
. Should we checkwriter_version
?Is there any other box we need to check?