MPEGGroup / FileFormat

MPEG file format discussions
23 stars 0 forks source link

[ISOBMFF]: reserved (`field = 0;`) handling is ambiguous #36

Closed baumanj closed 1 year ago

baumanj commented 3 years ago

Many fields in BMFF and derived specifications define fields with a syntax like

type foo = 0;

Often, though not always, with the name reserved. For example the hdlr box in 14496-12:2020 § 8.4.3.2:

aligned(8) class HandlerBox extends FullBox('hdlr', version = 0, 0) {
    unsigned int(32) pre_defined = 0;
    unsigned int(32) handler_type;
    const unsigned int(32)[3] reserved = 0;
    utf8string name;
}

Thought the semantics section makes no reference to either pre_defined or reserved.

ISO/IEC 14496-1:2010 § 8.2.2 defines the syntax itself, but doesn't indicate whether it constitutes a recommendation or a requirement. That is, if a field is fined in the SDL as = 0;, should a reader reject it as invalid if it contains nonzero data? I haven't been able to find anything in ISO specs that address this. From my perspective, it depends on the intent of the specific box and field, but given the absence of such guidance, having a standard interpretation seems valuable. After consideration, I think it depends on two factors:

Box vs FullBox

Since plain Box types are not versioned, I think readers must not enforce the zeroness of = 0; fields, or else a legitimate future change would be rejected. On the other hand, FullBox types have a version field, which I would expect to be updated for any semantic change, so rejecting fields defined = 0; which contain nonzero data seems like an important signal for defects in writer implementations. Otherwise, a later change to use those = 0; fields could lead to very surprising behavior far in the future.

That said, if a later amendment changes the interpretation of bits in a = 0; field to be meaningful without changing the version, a reader could end up rejecting a technically valid file. I'm not sure how likely this use case is, but it's definitely a tricky one. Perhaps in the absence of specific guidance in the semantics section for the FullBox which defines the = 0; field(s), it should be illegal to change the interpretation in an amendment unless the version field is also increased.

REserved for the future, or PREserved from the past?

As observed in the case of the hdlr box, sometimes = 0; isn't about future use, so much as about obsolete past use. This can be observed with the = 0; fields which had different semantics in the QuickTime specification which preceded it. Since it seems the retention of those fields in BMFF's hdlr box is about maintaining compatibility with old code, and they will likely never be changed, an explicit indication that readers shall ignore their contents seems valuable. Certainly, if in the future those = 0; bits are repurposed without at least updating the hdlr version (and probably to a lesser extent, even if it is updated), the files which have already been generated with invalid data in those locations could cause very subtle chaos. That seems worth avoiding.

Finally, there may be security implications associated with = 0; fields, and some guidance around that would be useful, but I'm not a security expert, so I won't make any specific recommendations.

cconcolato commented 3 years ago

@baumanj Thanks for the report. I agree with you that the spec needs clarification.

I did a pass in the 6th edition searching for =0 and found the following (besides versions and flags):

In the case of hdlr, the reserved fields are reserved because they may be or have been used in the QuickTime File Format: image

template

template is either used in FullBox or in SampleEntry. The spec (section 6.2.2) says:

Fields shown as “template” in the box descriptions are optional in the specifications that use this document. If the field is used in another specification, that use must be conformant with its definition here, and the specification must define whether the use is optional or mandatory. [...] if a field of that kind is not used in a specification, then it should be set to the indicated default value. If the field is not used it shall be copied un-inspected when boxes are copied, and ignored on reading.

Section 11.3:

Any template fields used must be explicitly declared; their use must be conformant with this document.

Section B.6

Template fields are defined in the file format. If any are used in a derived specification, the use must be compatible with the base definition, and that use explicitly documented.

Some of the text is a little bit confusing, but I think the intent is:

pre_defined

pre_defined is also either used in FullBox or in SampleEntry. Section 6.2.2:

fields marked “pre-defined” were used in an earlier edition of this document. [...] if a field of that kind is not used in a specification, then it should be set to the indicated default value. If the field is not used it shall be copied un-inspected when boxes are copied, and ignored on reading.

Again, I think the text should be improved to say that:

Box vs. FullBox

The spec says:

FullBoxes with an unrecognized version shall be ignored and skipped.

It does not say anything about flags. I think it should say that depending on the flags value, the syntax of a box may (e.g. saiz or may not (e.g. prft) change, but if a new flags value is introduced in a later edition that changes the box syntax, a new version will be used. In other words, given a version, flag values don't change the syntax and can be ignored if not known.

reserved

It is used in a mix of FullBox, Box, SampleEntry. I couldn't find any text describing it.

I think we should make a difference between old boxes inherited from QuickTime (like hdlr) where we know this won't change; and new boxes (typically FullBoxes) where it could change. Based on my above interpretations, we should change some of the legacy reserved into pre_defined.

For the other cases, where they may change, I think what matters is that the rest of the syntax does not change. I think version should be used to indicate that. We should say: reserved fields have a value that may change in future versions of this specification. When used in a FullBox, if a new value is introduced that requires changes to the syntax of the FullBox in a non-compatible way, this change will be done together with a change of version. When used in other structures, new values may be defined but will not change the syntax of the structure. In other words, given a version, reader may ignore values that they do not recognize.

pad

That is easy to fix as reserved_pad.

kgruen commented 2 years ago

@cconcolato: Thank you for the detailed review!

Finally, just a minor correction of a typo in your comment on template where you swapped riders and writers:

  • Readers Writers should write the value of the derived specification that is applicable, and if none, should use the 'template' value
  • Writers Readers should expect any value, and should keep parsing the box even if they don't recognize a value.
baumanj commented 2 years ago

I just now got around to reading your comment in detail @cconcolato. I think I agree with basically everything you're saying, especially updating more fields to be pre_defined, though I wonder if legacy would be an even clearer name.

However, with regard to these legacy/pre_defined fields (or whatever label we settle on), you suggest:

Readers may ignore the value. If they read it and find a different value, the box is considered in error and readers may stop reading the file.

For the sake of compatibility, I think readers must ignore the value, right? This lets old files stay valid while also protecting (somewhat) against future specs using those fields to carry other data that is meaningful. The only downside is that new writers which put something there may not get a signal about it being a non-standard thing to do. I suppose we could say:

  • Writers should always write that value (not shall, so old files remain valid)
  • Readers must ignore the value (I'm not sure if they should be required to copy it unchanged)

Would that work?

cconcolato commented 2 years ago

@kgruen You are right about my typo. I corrected my message above.

Regarding "may" vs "shall", I think you refer to:

If the field is not used it shall be copied un-inspected when boxes are copied, and ignored on reading.

vs.

Readers may ignore the value. If they read it and find a different value, the box is considered in error and readers may stop reading the file.

I don't think the ISOBMFF specification can impose requirements on readers. There is no way to verify conformance. But if people prefer a shall, I'm fine with that.

cconcolato commented 2 years ago

@baumanj I like the legacy idea.

Regarding the behavior of writers, do we have examples of files where a pre_defined was not written with the expected value? If so, or if we believe there are writers that do that in the field, then yes we should use "should". I don't think that's the case that's why I suggested "shall".

Regarding readers, per my response to Karsten, ISOBMFF is usually written without hard requirements on readers because we cannot verify what they do. For writers, it's easy to check a 'shall', you can inspect the output bitstream. Derived specifications could enforce that if they have means to do it.

baumanj commented 2 years ago

Regarding the behavior of writers, do we have examples of files where a pre_defined was not written with the expected value? If so, or if we believe there are writers that do that in the field, then yes we should use "should". I don't think that's the case that's why I suggested "shall".

avif-serialize used to write non-zero values in one of these hdlr fields: https://github.com/kornelski/avif-serialize/blame/a2f39a41b73196981c17976efbcbdf25b8a7041d/src/boxes.rs#L215

This was in a reserved rather than pre_defined field, but under our terminology here, I'd say both should be considered legacy and treated the same way. So, yes, I think it's a thing writers have done and may do in the future. At the end of the day, I think that sort of thing is better for readers to ignore, but only if we mandate legacy fields will never be repurposed without bumping the version.

Regarding readers, per my response to Karsten, ISOBMFF is usually written without hard requirements on readers because we cannot verify what they do. For writers, it's easy to check a 'shall', you can inspect the output bitstream. Derived specifications could enforce that if they have means to do it.

I'm not sure I understand. Surely we can specify that readers must (or must not) reject a certain input. That doesn't mean they will, in practice, but it makes it clear whether the reader is conforming to the spec or not. Am I completely missing your point here?

cconcolato commented 2 years ago

Surely we can specify that readers must (or must not) reject a certain input.

Usually, the specs put requirements on the bitstreams, i.e. a valid file shall have this or should have that. For reader requirements, we try to avoid hard requirements and use "readers are expected to", "readers should". But I'm not sure that's always followed and anyway, that's a detail.

cconcolato commented 2 years ago

Related to this issue, it seems that the meaning of hardcoded versions in the syntax is not clear. See https://github.com/AOMediaCodec/av1-avif/pull/170#issuecomment-953859902. We should try to address that too.

cconcolato commented 2 years ago

Here are my proposed changes:

Changes regarding “version” and “flags” and backwards-compatibility

Add to 4.2.2

When the SDL uses the syntax “version = 0”, it means that this document only defines the syntax for version 0 of the FullBox. When “= 0” is not present, it means different values are possible and the syntax has actual differences depending on the actual version used. Derived specifications are not permitted to define a new version of a box defined in this document.

When the SDL uses the syntax “flags = 0” (or sometimes simply “0”), it means that this document does not define flag values for this FullBox. When “= 0” is not present, it means different flag values are defined. Future editions of this document may define additional flags, but if a new value is introduced in a later edition that impacts the box syntax, a new version will be used. Derived specifications shall follow this too. As a consequence, if a reader supports a version of a FullBox, it can and should keep parsing the box even if it does not recognize a flag value. Writers should not write flag values that they do not understand.

Changes for the use of “template”, “required”, “pre_defined”

In 6.3.2 change

Fields shown as “template” in the box descriptions are optional in the specifications that use this document. If the field is used in another specification, that use must be conformant with its definition here, and the specification must define whether the use is optional or mandatory. Similarly, fields marked “pre-defined” were used in an earlier edition of this document. For both kinds of fields, if a field of that kind is not used in a specification, then it should be set to the indicated default value. If the field is not used it shall be copied un-inspected when boxes are copied, and ignored on reading.

to

Fields shown as “template” in the syntaxes are interpreted as follows:

  • The value given in the right-hand side part of the SDL is a default value.
  • Derived specifications are permitted to define additional possible values for the field, but they shall indicate when these should be used, and they shall not modify the remaining syntax of the box to depend on the value of this field.
  • Writers should write this default value unless they conform to a derived specification, in which case they may write another value, as permitted by that derived specification.
  • Readers should expect any value and should keep parsing the box even if they don't recognize the value.

Fields named “pre_defined” in the syntaxes are interpreted as follows:

  • They had specific semantics in an earlier edition of this document, but their semantics are no longer in use. They are considered legacy fields.
  • The value given in the right-hand side part of the SDL is the only permitted value.
  • Derived specifications are not permitted to change this value.
  • Writers shall always write this value.
  • Readers should expect any value, and should keep parsing the box even if they don't recognize a value. Derived specifications may require stricter behavior from readers (e.g. requiring reading to stop and emit an error when reading a different value).

Fields name “reserved” in the syntaxes are interpreted as follows:

  • They may have a default value defined in this specification (typically 0) or they may not.
  • Additional values may be defined in future versions of this specification.
  • Derived specifications are not permitted to define values for this field.
  • This specification guarantees the following:
    • When used in a FullBox, if a new value is introduced that requires changes to the syntax of the FullBox in a non-compatible way, this change will be done together with a change of version.
    • When used in other structures, new values may be defined but these will not affect the syntax of the structure.
  • Writers shall always write the default value, if specified, or shall write a value of 0, if not specified.
  • Readers should expect any value, and should keep parsing the box even if they don't recognize the value.

In 8.4.2.2, change

bit(1) pad = 0;

to

bit(1) pre_defined = 0;

kgruen commented 2 years ago

Just noticed two typos:

thatkind (should read that kind) they shall not modify the remaining syntax of the box do to depend on the value of this field.

baumanj commented 2 years ago

Future editions of this document may define additional flags, but if a new value is introduced in a later edition that impacts the box syntax, a new version will be used

What's an example of a new flag value that doesn't impact the box syntax? I'm trying to imagine one that would be ok for readers to just ignore.

Some minor suggestions:

  1. For greater clarity:

    a new version will shall be used

  2. Typo, make writer plural:

    Writers should not write flag values that they do not understand.

And one somewhat larger concern around reserved:

When used in a FullBox, if a new value is introduced that requires changes to the syntax of the FullBox in a non-compatible way, this change will be done together with a change of version.

Let's consider an example with ImageMirror (imir). It extends ItemProperty, which extends Box, so it's not versioned, and has the following syntax:

aligned(8) class ImageMirror
extends ItemProperty('imir') {
       unsigned int (7) reserved = 0;
       unsigned int (1) axis;
}

If the reserved bits are for future use, how would that work? Currently only bit 8 is used, so imagine we use bit 7 to indicate that the image should be mirrored about a diagonal axis:

00: —
10: |
10: /
11: \

If we follow the new text:

When used in other structures, new values may be defined but these will not affect the syntax of the structure

Is that a syntactic change? It's unclear to me, but if it's not a syntactic change, I don't see how one would ever make any changes to reserved fields. As such, I assume this is not a syntactic change because old parsers should still be able to handle the input as long as they follow the new text:

Readers should expect any value, and should keep parsing the box even if they don't recognize the value

In this case, old parsers will happily ignore the set bit in position 7 and render the wrong thing. The alternative would be for readers to recognize a set bit in a field they expect to be 0 and either ignoring the property entirely or ignoring the whole item according to whether the essential bit is set in the association (see ipma in BMFF § 8.11.14). Both of those seem preferable to misinterpreting the input and transforming the image wrongly.

cconcolato commented 2 years ago

What's an example of a new flag value that doesn't impact the box syntax? I'm trying to imagine one that would be ok for readers to just ignore.

Flags in tkhd, prft, elst don't impact the syntax.

cconcolato commented 2 years ago

2. Typo, make writer plural:

Writers should not write flag values that they do not understand.

fixed

cconcolato commented 2 years ago
  1. For greater clarity:

    a new version ~will~ shall be used

I'm reluctant to using "shall" when the subject is the standardization body, but not a strong objection.

cconcolato commented 2 years ago

I think you meant:

00: —
01: |
10: /
11: \

right?

When used in other structures, new values may be defined but these will not affect the syntax of the structure

Is that a syntactic change?

No.

The text only talks about "parsing" not about interpreting. Changing a reserved field (in a non-FullBox or in a FullBox without bumping a version) should only be done if it is acceptable for old readers to ignore the value. So in your case, if the designer of the box, think it is acceptable for an old renderer to render 11 as 01, then they can use bit 7, but if not, then a new box should be used.

baumanj commented 2 years ago

think you meant

Yes, my mistake

Changing a reserved field (in a non-FullBox or in a FullBox without bumping a version) should only be done if it is acceptable for old readers to ignore the value.

Hmmm, I'm not a fan of that level of subtlety. I think it would be less error-prone to require these fields in non-versioned boxes to retain their semantics permanently or else have a clear fallback of what should be ignored in a way that's not potentially different for different boxes. If authors want better extensibility, they can use something versioned, or perhaps the expandable keyword. To me, the nonzero bits that should be zero are a signal that informs the reader that they don't know how to properly interpret the input, possibly because it's too new or possibly because it's corrupt. I think that has more value than allowing such subtle complexity.

Flags in tkhd, prft, elst don't impact the syntax.

Thanks! I'm not previously familiar, so I will have a look and get back to you

baumanj commented 2 years ago

What's an example of a new flag value that doesn't impact the box syntax? I'm trying to imagine one that would be ok for readers to just ignore.

Flags in tkhd, prft, elst don't impact the syntax.

Having looked at tkhd now (from BMFF § 8.3.2.3):

flags is a 24-bit integer with flags; the following values are defined: track_enabled: Indicates that the track is enabled. Flag value is 0x000001. A disabled track (the low bit is zero) is treated as if it were not present.

I don't know the actual history of this box (and I don't think it's relevant), but for the purpose of this issue, let's pretend the original version of the box didn't use the flags field at all and there was no concept of disabled tracks. If we follow the suggested text:

Future editions of this document may define additional flags, but if a new value is introduced in a later edition that impacts the box syntax, a new version will be used. Derived specifications shall follow this too. As a consequence, if a reader supports a version of a FullBox, it can and should keep parsing the box even if it does not recognize a flag value. Writers should not write flag values that they do not understand.

If there's no change to the version field, an older parser will not know what to do with the nonzero flags value. It will wrongly process tracks that should be disabled. I don't think that's an ideal outcome. Especially since the existence of a flags field implies FullBox and versioning is available, I think any change to flag semantics should require an update. In this case, it would lead to the desired outcome (old parsers ignore this tkhd), but maybe there are counterexamples where the outcome would be bad. Can you think of any?

dwsinger commented 2 years ago

we shouldn't write shalls that apply to the standards body; they are unenforceable.

baumanj commented 2 years ago

we shouldn't write shalls that apply to the standards body; they are unenforceable.

Can you elaborate? I've never thought anything in and standards document was "enforceable", but the distinction between "should" and "shall" has always been clear to me in a way that "will" is not.

dwsinger commented 2 years ago

yes, if we write "the field shall have the value zero" then we can inspect files and declare them non-conformant. so if we write "we will do X" and then don't do it, people can complain. generally, we try to keep our use of formal conformance verbs (shall and should and may) to verifiable things. we have people who scan specs for shalls and try to write tests for them, for example...and if a test cannot be written, they complain.

cconcolato commented 1 year ago

Implemented in https://dms.mpeg.expert/doc_end_user/documents/140_Mainz/wg11/MDS21996_WG03_N00717.zip