Closed baumanj closed 1 year ago
@baumanj Thanks for the report. I agree with you that the spec needs clarification.
I did a pass in the 6th edition searching for =0
and found the following (besides versions and flags):
template
: tsel
, stri
, vmhd
, VisualSampleEntry
, smhd
, AudioSampleEntry
, mvhd
, tkhd
reserved
: iloc
, fpar
, stvi
, sidx
, RTPpacket
, tele
, drap
, VisualSampleEntry
, colr
, cclv
(reserved1, ...), smhd
, AudioSampleEntry
, chnl
, dmix
, LoudnessBaseBox
, hmhd
, mvhd
, tkhd
, SampleEntry
,stz2
, padb
, 'trex' flags, 'tfra', 'csgp'pre_defined
: VisualSampleEntry
, AudioSampleEntry
, mvhd
,mdhd
bit(1) pad = 0;
) in mdhd
and cprt
In the case of hdlr
, the reserved fields are reserved because they may be or have been used in the QuickTime File Format:
template
is either used in FullBox
or in SampleEntry
.
The spec (section 6.2.2) says:
Fields shown as “template” in the box descriptions are optional in the specifications that use this document. If the field is used in another specification, that use must be conformant with its definition here, and the specification must define whether the use is optional or mandatory. [...] if a field of that kind is not used in a specification, then it should be set to the indicated default value. If the field is not used it shall be copied un-inspected when boxes are copied, and ignored on reading.
Section 11.3:
Any template fields used must be explicitly declared; their use must be conformant with this document.
Section B.6
Template fields are defined in the file format. If any are used in a derived specification, the use must be compatible with the base definition, and that use explicitly documented.
Some of the text is a little bit confusing, but I think the intent is:
pre_defined
is also either used in FullBox
or in SampleEntry
.
Section 6.2.2:
fields marked “pre-defined” were used in an earlier edition of this document. [...] if a field of that kind is not used in a specification, then it should be set to the indicated default value. If the field is not used it shall be copied un-inspected when boxes are copied, and ignored on reading.
Again, I think the text should be improved to say that:
The spec says:
FullBoxes with an unrecognized version shall be ignored and skipped.
It does not say anything about flags
. I think it should say that depending on the flags
value, the syntax of a box may (e.g. saiz
or may not (e.g. prft
) change, but if a new flags
value is introduced in a later edition that changes the box syntax, a new version will be used. In other words, given a version, flag values don't change the syntax and can be ignored if not known.
It is used in a mix of FullBox, Box, SampleEntry. I couldn't find any text describing it.
I think we should make a difference between old boxes inherited from QuickTime (like hdlr
) where we know this won't change; and new boxes (typically FullBoxes) where it could change. Based on my above interpretations, we should change some of the legacy reserved
into pre_defined
.
For the other cases, where they may change, I think what matters is that the rest of the syntax does not change. I think version should be used to indicate that. We should say:
reserved
fields have a value that may change in future versions of this specification. When used in a FullBox
, if a new value is introduced that requires changes to the syntax of the FullBox in a non-compatible way, this change will be done together with a change of version. When used in other structures, new values may be defined but will not change the syntax of the structure. In other words, given a version, reader may ignore values that they do not recognize.
That is easy to fix as reserved_pad.
@cconcolato: Thank you for the detailed review!
pre_defined
: Is there a reason why you suggest using "may" where it read "shall" in the spec? reserved
into pre_defined
, especially if they had been used in QuickTime.flags
nor the contents of reserved
fields change the syntax (while new versions may do so), because this means that readers can safely ignore these fields if they copy some boxes. Finally, just a minor correction of a typo in your comment on template
where you swapped riders and writers:
ReadersWriters should write the value of the derived specification that is applicable, and if none, should use the 'template' valueWritersReaders should expect any value, and should keep parsing the box even if they don't recognize a value.
I just now got around to reading your comment in detail @cconcolato. I think I agree with basically everything you're saying, especially updating more fields to be pre_defined
, though I wonder if legacy
would be an even clearer name.
However, with regard to these legacy/pre_defined fields (or whatever label we settle on), you suggest:
Readers may ignore the value. If they read it and find a different value, the box is considered in error and readers may stop reading the file.
For the sake of compatibility, I think readers must ignore the value, right? This lets old files stay valid while also protecting (somewhat) against future specs using those fields to carry other data that is meaningful. The only downside is that new writers which put something there may not get a signal about it being a non-standard thing to do. I suppose we could say:
- Writers should always write that value (not shall, so old files remain valid)
- Readers must ignore the value (I'm not sure if they should be required to copy it unchanged)
Would that work?
@kgruen You are right about my typo. I corrected my message above.
Regarding "may" vs "shall", I think you refer to:
If the field is not used it shall be copied un-inspected when boxes are copied, and ignored on reading.
vs.
Readers may ignore the value. If they read it and find a different value, the box is considered in error and readers may stop reading the file.
I don't think the ISOBMFF specification can impose requirements on readers. There is no way to verify conformance. But if people prefer a shall, I'm fine with that.
@baumanj I like the legacy
idea.
Regarding the behavior of writers, do we have examples of files where a pre_defined
was not written with the expected value? If so, or if we believe there are writers that do that in the field, then yes we should use "should". I don't think that's the case that's why I suggested "shall".
Regarding readers, per my response to Karsten, ISOBMFF is usually written without hard requirements on readers because we cannot verify what they do. For writers, it's easy to check a 'shall', you can inspect the output bitstream. Derived specifications could enforce that if they have means to do it.
Regarding the behavior of writers, do we have examples of files where a
pre_defined
was not written with the expected value? If so, or if we believe there are writers that do that in the field, then yes we should use "should". I don't think that's the case that's why I suggested "shall".
avif-serialize
used to write non-zero values in one of these hdlr
fields: https://github.com/kornelski/avif-serialize/blame/a2f39a41b73196981c17976efbcbdf25b8a7041d/src/boxes.rs#L215
This was in a reserved
rather than pre_defined
field, but under our terminology here, I'd say both should be considered legacy and treated the same way. So, yes, I think it's a thing writers have done and may do in the future. At the end of the day, I think that sort of thing is better for readers to ignore, but only if we mandate legacy fields will never be repurposed without bumping the version.
Regarding readers, per my response to Karsten, ISOBMFF is usually written without hard requirements on readers because we cannot verify what they do. For writers, it's easy to check a 'shall', you can inspect the output bitstream. Derived specifications could enforce that if they have means to do it.
I'm not sure I understand. Surely we can specify that readers must (or must not) reject a certain input. That doesn't mean they will, in practice, but it makes it clear whether the reader is conforming to the spec or not. Am I completely missing your point here?
Surely we can specify that readers must (or must not) reject a certain input.
Usually, the specs put requirements on the bitstreams, i.e. a valid file shall have this or should have that. For reader requirements, we try to avoid hard requirements and use "readers are expected to", "readers should". But I'm not sure that's always followed and anyway, that's a detail.
Related to this issue, it seems that the meaning of hardcoded versions in the syntax is not clear. See https://github.com/AOMediaCodec/av1-avif/pull/170#issuecomment-953859902. We should try to address that too.
Here are my proposed changes:
Add to 4.2.2
When the SDL uses the syntax “version = 0”, it means that this document only defines the syntax for version 0 of the FullBox. When “= 0” is not present, it means different values are possible and the syntax has actual differences depending on the actual version used. Derived specifications are not permitted to define a new version of a box defined in this document.
When the SDL uses the syntax “flags = 0” (or sometimes simply “0”), it means that this document does not define flag values for this FullBox. When “= 0” is not present, it means different flag values are defined. Future editions of this document may define additional flags, but if a new value is introduced in a later edition that impacts the box syntax, a new version will be used. Derived specifications shall follow this too. As a consequence, if a reader supports a version of a FullBox, it can and should keep parsing the box even if it does not recognize a flag value. Writers should not write flag values that they do not understand.
In 6.3.2 change
Fields shown as “template” in the box descriptions are optional in the specifications that use this document. If the field is used in another specification, that use must be conformant with its definition here, and the specification must define whether the use is optional or mandatory. Similarly, fields marked “pre-defined” were used in an earlier edition of this document. For both kinds of fields, if a field of that kind is not used in a specification, then it should be set to the indicated default value. If the field is not used it shall be copied un-inspected when boxes are copied, and ignored on reading.
to
Fields shown as “template” in the syntaxes are interpreted as follows:
- The value given in the right-hand side part of the SDL is a default value.
- Derived specifications are permitted to define additional possible values for the field, but they shall indicate when these should be used, and they shall not modify the remaining syntax of the box to depend on the value of this field.
- Writers should write this default value unless they conform to a derived specification, in which case they may write another value, as permitted by that derived specification.
- Readers should expect any value and should keep parsing the box even if they don't recognize the value.
Fields named “pre_defined” in the syntaxes are interpreted as follows:
- They had specific semantics in an earlier edition of this document, but their semantics are no longer in use. They are considered legacy fields.
- The value given in the right-hand side part of the SDL is the only permitted value.
- Derived specifications are not permitted to change this value.
- Writers shall always write this value.
- Readers should expect any value, and should keep parsing the box even if they don't recognize a value. Derived specifications may require stricter behavior from readers (e.g. requiring reading to stop and emit an error when reading a different value).
Fields name “reserved” in the syntaxes are interpreted as follows:
- They may have a default value defined in this specification (typically 0) or they may not.
- Additional values may be defined in future versions of this specification.
- Derived specifications are not permitted to define values for this field.
- This specification guarantees the following:
- When used in a FullBox, if a new value is introduced that requires changes to the syntax of the FullBox in a non-compatible way, this change will be done together with a change of version.
- When used in other structures, new values may be defined but these will not affect the syntax of the structure.
- Writers shall always write the default value, if specified, or shall write a value of 0, if not specified.
- Readers should expect any value, and should keep parsing the box even if they don't recognize the value.
In 8.4.2.2, change
bit(1) pad = 0;
to
bit(1) pre_defined = 0;
Just noticed two typos:
thatkind (should read that kind) they shall not modify the remaining syntax of the box
doto depend on the value of this field.
Future editions of this document may define additional flags, but if a new value is introduced in a later edition that impacts the box syntax, a new version will be used
What's an example of a new flag value that doesn't impact the box syntax? I'm trying to imagine one that would be ok for readers to just ignore.
Some minor suggestions:
a new version
willshall be used
Writers should not write flag values that they do not understand.
And one somewhat larger concern around reserved
:
When used in a FullBox, if a new value is introduced that requires changes to the syntax of the FullBox in a non-compatible way, this change will be done together with a change of version.
Let's consider an example with ImageMirror
(imir
). It extends ItemProperty
, which extends Box
, so it's not versioned, and has the following syntax:
aligned(8) class ImageMirror
extends ItemProperty('imir') {
unsigned int (7) reserved = 0;
unsigned int (1) axis;
}
If the reserved
bits are for future use, how would that work? Currently only bit 8 is used, so imagine we use bit 7 to indicate that the image should be mirrored about a diagonal axis:
00: —
10: |
10: /
11: \
If we follow the new text:
When used in other structures, new values may be defined but these will not affect the syntax of the structure
Is that a syntactic change? It's unclear to me, but if it's not a syntactic change, I don't see how one would ever make any changes to reserved fields. As such, I assume this is not a syntactic change because old parsers should still be able to handle the input as long as they follow the new text:
Readers should expect any value, and should keep parsing the box even if they don't recognize the value
In this case, old parsers will happily ignore the set bit in position 7 and render the wrong thing. The alternative would be for readers to recognize a set bit in a field they expect to be 0 and either ignoring the property entirely or ignoring the whole item according to whether the essential
bit is set in the association (see ipma
in BMFF § 8.11.14). Both of those seem preferable to misinterpreting the input and transforming the image wrongly.
What's an example of a new flag value that doesn't impact the box syntax? I'm trying to imagine one that would be ok for readers to just ignore.
Flags in tkhd
, prft
, elst
don't impact the syntax.
2. Typo, make writer plural:
Writers should not write flag values that they do not understand.
fixed
- For greater clarity:
a new version ~will~ shall be used
I'm reluctant to using "shall" when the subject is the standardization body, but not a strong objection.
I think you meant:
00: —
01: |
10: /
11: \
right?
When used in other structures, new values may be defined but these will not affect the syntax of the structure
Is that a syntactic change?
No.
The text only talks about "parsing" not about interpreting. Changing a reserved field (in a non-FullBox or in a FullBox without bumping a version) should only be done if it is acceptable for old readers to ignore the value. So in your case, if the designer of the box, think it is acceptable for an old renderer to render 11
as 01
, then they can use bit 7, but if not, then a new box should be used.
think you meant
Yes, my mistake
Changing a reserved field (in a non-FullBox or in a FullBox without bumping a version) should only be done if it is acceptable for old readers to ignore the value.
Hmmm, I'm not a fan of that level of subtlety. I think it would be less error-prone to require these fields in non-versioned boxes to retain their semantics permanently or else have a clear fallback of what should be ignored in a way that's not potentially different for different boxes. If authors want better extensibility, they can use something versioned, or perhaps the expandable
keyword. To me, the nonzero bits that should be zero are a signal that informs the reader that they don't know how to properly interpret the input, possibly because it's too new or possibly because it's corrupt. I think that has more value than allowing such subtle complexity.
Flags in
tkhd
,prft
,elst
don't impact the syntax.
Thanks! I'm not previously familiar, so I will have a look and get back to you
What's an example of a new flag value that doesn't impact the box syntax? I'm trying to imagine one that would be ok for readers to just ignore.
Flags in
tkhd
,prft
,elst
don't impact the syntax.
Having looked at tkhd
now (from BMFF § 8.3.2.3):
flags
is a 24-bit integer with flags; the following values are defined:track_enabled
: Indicates that the track is enabled. Flag value is 0x000001. A disabled track (the low bit is zero) is treated as if it were not present.
I don't know the actual history of this box (and I don't think it's relevant), but for the purpose of this issue, let's pretend the original version of the box didn't use the flags
field at all and there was no concept of disabled tracks. If we follow the suggested text:
Future editions of this document may define additional flags, but if a new value is introduced in a later edition that impacts the box syntax, a new version will be used. Derived specifications shall follow this too. As a consequence, if a reader supports a version of a FullBox, it can and should keep parsing the box even if it does not recognize a flag value. Writers should not write flag values that they do not understand.
If there's no change to the version
field, an older parser will not know what to do with the nonzero flags
value. It will wrongly process tracks that should be disabled. I don't think that's an ideal outcome. Especially since the existence of a flags
field implies FullBox
and versioning is available, I think any change to flag semantics should require an update. In this case, it would lead to the desired outcome (old parsers ignore this tkhd
), but maybe there are counterexamples where the outcome would be bad. Can you think of any?
we shouldn't write shalls that apply to the standards body; they are unenforceable.
we shouldn't write shalls that apply to the standards body; they are unenforceable.
Can you elaborate? I've never thought anything in and standards document was "enforceable", but the distinction between "should" and "shall" has always been clear to me in a way that "will" is not.
yes, if we write "the field shall have the value zero" then we can inspect files and declare them non-conformant. so if we write "we will do X" and then don't do it, people can complain. generally, we try to keep our use of formal conformance verbs (shall and should and may) to verifiable things. we have people who scan specs for shalls and try to write tests for them, for example...and if a test cannot be written, they complain.
Many fields in BMFF and derived specifications define fields with a syntax like
Often, though not always, with the name
reserved
. For example thehdlr
box in 14496-12:2020 § 8.4.3.2:Thought the semantics section makes no reference to either
pre_defined
orreserved
.ISO/IEC 14496-1:2010 § 8.2.2 defines the syntax itself, but doesn't indicate whether it constitutes a recommendation or a requirement. That is, if a field is fined in the SDL as
= 0;
, should a reader reject it as invalid if it contains nonzero data? I haven't been able to find anything in ISO specs that address this. From my perspective, it depends on the intent of the specific box and field, but given the absence of such guidance, having a standard interpretation seems valuable. After consideration, I think it depends on two factors:Box
vsFullBox
Since plain
Box
types are not versioned, I think readers must not enforce the zeroness of= 0;
fields, or else a legitimate future change would be rejected. On the other hand,FullBox
types have aversion
field, which I would expect to be updated for any semantic change, so rejecting fields defined= 0;
which contain nonzero data seems like an important signal for defects in writer implementations. Otherwise, a later change to use those= 0;
fields could lead to very surprising behavior far in the future.That said, if a later amendment changes the interpretation of bits in a
= 0;
field to be meaningful without changing the version, a reader could end up rejecting a technically valid file. I'm not sure how likely this use case is, but it's definitely a tricky one. Perhaps in the absence of specific guidance in the semantics section for theFullBox
which defines the= 0;
field(s), it should be illegal to change the interpretation in an amendment unless theversion
field is also increased.REserved for the future, or PREserved from the past?
As observed in the case of the
hdlr
box, sometimes= 0;
isn't about future use, so much as about obsolete past use. This can be observed with the= 0;
fields which had different semantics in the QuickTime specification which preceded it. Since it seems the retention of those fields in BMFF'shdlr
box is about maintaining compatibility with old code, and they will likely never be changed, an explicit indication that readers shall ignore their contents seems valuable. Certainly, if in the future those= 0;
bits are repurposed without at least updating thehdlr
version (and probably to a lesser extent, even if it is updated), the files which have already been generated with invalid data in those locations could cause very subtle chaos. That seems worth avoiding.Finally, there may be security implications associated with
= 0;
fields, and some guidance around that would be useful, but I'm not a security expert, so I won't make any specific recommendations.