AOMediaCodec / av1-isobmff

Official specification of the AOM group for the carriage of AV1 in ISOBMFF
https://AOMediaCodec.github.io/av1-isobmff
64 stars 16 forks source link

Improper metadata location #107

Closed fcartegnie closed 1 year ago

fcartegnie commented 5 years ago

2.4

NOTE: Other types of OBUs such as metadata OBUs could be present before the Sequence Header OBU.

Some metadata is non global and related to each frame (many if sync point w/ non visible).

AV1 7.5

A coded video sequence consists of one or more temporal units. A temporal unit consists of a series of OBUs starting from a temporal delimiter, optional sequence headers, optional metadata OBUs, a sequence of one or more frame headers, each followed by zero or more tile group OBUs as well as optional padding OBUs.

There's no schema like with mpeg codecs, but to me that enumeration is an OBU order. No Metadata can then be before sequence header.

cconcolato commented 2 years ago

The group is finally considering this issue. Sorry for the delay. We will sync with the video group to determine if the sentence you quote is a hard requirement of the OBU order.

Note that the current AV1-ISOBMFF spec says:

The configOBUs field contains zero or more OBUs. Any OBU may be present provided that the following procedures produce compliant AV1 bitstreams: From any sync sample, an AV1 bitstream is formed by first outputting the OBUs contained in the AV1CodecConfigurationBox and then by outputing all OBUs in the samples themselves, in order, starting from the sync sample.

So if we consider an example of configOBUs only containing metadata (no SH), the concatenation procedure would have to be changed.

jzern commented 1 year ago

In a following paragraph the current specification goes on to say:

One or more metadata and padding OBUs may appear in any order within an OBU sequence (unless constrained by semantics provided elsewhere in this specification). Specific metadata types may be required or recommended to be placed in specific locations, as identified in their corresponding definitions.

The normative decoder doesn't impose any ordering restrictions.

tdaede commented 1 year ago

At least the current av1-hdr10plus draft violates that ordering: https://aomediacodec.github.io/av1-hdr10plus/

In particular, it places the metadata OBU before the first shown frame, but not before all frames in the temporal unit.

cconcolato commented 1 year ago

At least the current av1-hdr10plus draft violates that ordering

Can you elaborate? av1-hdr10plus seems compliant to both the text in 7.5 and the text that @jzern quoted.

tdaede commented 1 year ago

It violates the order given in the first sentence in 7.5 (if it's interpreted strictly), but not jzern's following text. In particular, metadata OBUs are place after frame header OBUs in some cases, e.g. TU1 in https://aomediacodec.github.io/av1-hdr10plus/obu_tu.png

cconcolato commented 1 year ago

As discussed in the group during the call, I just wanted to clarify a use case. Consider a muxer that detects that a Metadata OBU is common to all frames in a track and decides to store it out-of-band, i.e. in the configOBUs field. Per the current spec, this is possible given:

The configOBUs field contains zero or more OBUs. Any OBU may be present provided that the following procedures produce compliant AV1 bitstreams:

  • From any sync sample, an AV1 bitstream is formed by first outputting the OBUs contained in the AV1CodecConfigurationBox and then by outputing all OBUs in the samples themselves, in order, starting from the sync sample.

Because a sync sample starts with a SH OBU, this would mean that the decoder would be fed: Metadata OBU, SH OBU, ... Thus the need to say if this possible.

Note that if the configOBUs also contained a SH OBU, the decoder would receive SH OBU, Metadata OBU, SH OBU, ... . In this latter case, presumably the 2nd SH OBU is a redundant version of the first one.

cconcolato commented 1 year ago

Based on our understanding, we agree that the video specification could be less ambiguous but the intent is to allow metadata to be anywhere (e.g. can be seen at the sequence level, or at the frame level) and therefore this AV1-ISOBMFF specification is not in conflict with the AV1 video specification. Reopen the issue if you disagree.