Closed cconcolato closed 6 years ago
After consulting with @dwsinger and @KilroyHughes, the definition of CMAF Media Profiles at this stage does NOT seem necessary, for several reasons:
I will propose a PR to update the spec.
I agree, it seems that we can start the game with labeling the files as being (truly) 'CMAF structurally compatible' (cmfc) and 'AV1 storage in ISOBMFF compatible', and see where that gets us. We can introduce more restrictive specs and their brands later.
@dwsinger Oups. Fixed.
Because the 'codecs' subparameter for AV-1 contains most video parameters necessary to determine content/device interoperability, it isn't necessary to rely as much on Media Profile brands to determine content/device interoperability. A general brand for the AV-1 Track format plus the 'codecs' parameter is sufficient.
In the case of AVC and HEVC, the 'codecs' subparameter doesn't identify important video characteristics such as color space, transfer function, encoding range, bit depth, etc. For those codecs, Media Profile brands are necessary to identify constraints on those parameters as well as encoding constraints to enable switching and splicing of bitstreams in browsers and decoders.
Media Profile brands also codify interop points that content producers and devices can agree on to reduce the number of variations that need to be encoded, decoded, and tested. Over time, AV-1 industry practice may converge on a few interop points that can then be given more specific CMAF Media Profile brands for HD, UHD, HDR, etc.
I have pushed a PR here: https://github.com/AOMediaCodec/av1-isobmff/pull/44
Some more questions:
pasp
, colr
and HDR metadata: Is there any reason why we would require pasp
and colr
in CMAF tracks and not in basic AV1-ISOBMFF file? Shouldn't we require them all the time? Regarding HDR metadata, we have a should for OBU. What about the HDR boxes (either the VP9 ones or the MIAF ones)?cmfc
?pasp and colr boxes are only required in one edge case where there was no decoder configuration record in the sample entry containing SPS/PPS NALs for initialization. BBC encoded content with inband parameters (i.e. M2TS, 'avc3', and 'hev1'), but no parameter set in the sample entry. The video sample entry fields lack some of the info in SPS, like color and sample aspect ratio, so those boxes were required in that case to help initialize the decoder, picture buffers, etc. The media pipeline will use the SPS info and cropping parameters in each Fragment for decoding, cropping, scaling, and rendering. pasp and colr will be ignored by most decoders during decoding, but can be used by an app to configure picture buffers, display adaptation and scaling, LUT, etc. I think 'cmfc' plus a Media Profile brand for AV1 is what's needed.
Seems to me a terrible idea to require parsing OBU and not store profile in DCR. If some decoders have restrictions/capabilities, it becomes impossible to fallback to another one without parsing data. OBU parsing specific code is then duplicated and complexity added.
You don't either want to read sample when "The configOBUs field contains zero or more"
Also, the data might not go to a decoder. xVcC have an intra DCR versioning on the first byte, making the whole struct usable for extradata.
Seems ffmpeg goes the OBU only way as extradata which implies the issues above, and makes reparsing OBU or samples mandatory, just like AnnexB AVC/HEVC. Sounds a regression to me.
pasp and colr boxes are highly useful as they are codec-independent and can be parsed and used before a codec-specific reader is instantiated. they also really don't belong in the codec; the codec operation is unaffected by their values.
CMAF relies on the definition of CMAF Media Profiles to achieve better interop. AV1 defines 3 profiles but the levels are not yet finalized with about 12 levels. Defining 36 CMAF AV1 brands does not seem a good idea. Is there a minimum set to define or should we defer definition of CMAF brands to future versions?