Resolution Switching in a CMAF Track

haudiobe commented 2 years ago

Colleagues,

We remain to have the open question of resolution switching in a Representation, for AVC, HEVC and VVC

Facts: 1) DASH and CMAF do not prohibit change of resolution within one Representation. However, practically this is restricted to Fragment/Segment boundaries, i.e. typically at most every 1 second. 2) DVB-DASH prohibits this, and DVB MPEG-2 TS more or less restricts this as well to rare occurrences 3) AVC, HEVC and VVC specs are more or less silent, but it is valid that for a. HEVC and AVC, at an SPS (IDR Frame), you can change resolutions. This is likely when you do an IDR. b. VVC you can change resolutions in the output basically for every frame.

So we do have no answer from video standards. For CMAF, is there anything that we want to restrict?

Options: 1) No restrictions 2) Restrictions to Segment/Fragment boundaries (to implement the practical restrictions for HEVC and AVC) 3) Restrictions related to the maximum distance of resolution changes for VVC 4) Restrictions to not permit any resolution changes 5) Add signaling or profile to provide the option 6) Others

I am also unsure how to come to a decision on this matter because 1) Video/JVET is silent 2) People want to use this feature, for example Harmonic is strongly advocating. 3) DVB restricts this for good reasons.

Please preferably comment there.

haudiobe commented 2 years ago

Proposal for CMAF: 1) No restrictions, but state this explicit. Add a note that this may be further restricted by users of CMAF. 2) We await feedback from DVB if they can relax the constraint. If not, we can add a restricted CMAF profile for this purpose. 3) Communicate with CMAF users on feedback.

gteniou commented 2 years ago

For AVC, HEVC, I agree that feedback from DVB will tell us if any restriction is necessary or not.
For VVC, it is still unknown if this feature will be extensively used. Unless explicitly asked by anyone, I would suggest aligning the constraint level with AVC/HEVC (usually, 1st generations of encoders rely on previous codecs' algorithms before being optimised with new tools)

Therefore ok with the above proposal.

jpiesing commented 2 years ago

I don't believe DVB has any restriction. IMHO an implicit restriction comes in MPEG DASH where Representation has one pair of @width and @height. These are optional in MPEG DASH but mandatory in DVB-DASH. Is this really an issue for the manifest referring to CMAF content rather than CMAF itself? I still maintain that changing resolutions in a CMAF track is very likely to break the surrounding system even if implicitly permitted by CMAF.

yagosf commented 2 years ago

I would prefer not to see any restrictions in CMAF with this respect.

I am not familiar with DVB-DASH, I was looking where resolution switching is disallowed within a Representation to understand this better. I see that 'avc3' is supported and cannot find constraints on SPS values. Also, I do not see any issue neither with @width nor @height signalling in DASH. My understanding always was that this could involve upscaling based on the TrackHeaderBox.width and TrackHeaderBox.height.

In any case, I understand that in some environments devices cannot handle that output pictures sizes vary from picture to picture. So I think it is ok that derived specifications add their own constraints based on their needs.

IMHO, one last aspect is that for VVC decoders it does not matter whether resolution changes happen at IRAPs or not. If in some environments the limitation is only how often resolutions can change it should be enough to add a time limitation, without restricting it to only happen in IRAPs, or am I missing something? Maybe such a question could be raised why it is important that it only happens at IRAPs...

vdrugeon commented 2 years ago

I quite like the suggestion from Thomas to ask for feedback from CMAF users. I agree with Jon that many implementations out there seem to assume that the resolution of a video will never change within a CMAF track. While resolution changes within a track may be allowed by the specifications (MPEG-DASH, CMAF and DVB-DASH), there is a difference between what was intended when writing the specifications and how implementers understood the specification. In this particular case, it seems that there was a wide enough misunderstanding that the resolution is fixed within a DASH representation because there is only one value of @width and @height per Representation.

We should also be very careful about VVC. I agree that currently, allowing resolution changes from picture to picture would be a huge burden on many implementations. Since it is allowed if nothing is mentioned, we should at least do something about that. But adding only a time limitation may also be risky, since we are not sure how easy it is to cope with a resolution change in the middle of a segment. IMHO having both restrictions would be the safest for interoperability: at segment boundary (i.e. at an IRAP) and a time limitation. This is the direction that DVB is taking.

The safest course of action would of course be to completely forbid resolution changes within a CMAF track, but I understand if it is felt as too restrictive for VVC.

cconcolato commented 2 years ago

Can we get a clarification from implementers regarding what the problem is? Whether you switch resolution when changing Representation or within the same representation, what is the difference? Do they assume the rescaling pipeline to be constant for a segment? Or is it an issue that there is latency introduced when you change rescaling parameters and therefore only the frequency of the change matters?

mikedo commented 2 years ago

For VVC, we should be the least restrictive as possible (starting with none). For any parameter, the user ecosystem can further restrict it as needed, even temporarily as DVB has done. But if the intent is to meet the requirements of a common media profile, then we need to vet this broader than just those present this week. And, I think it is too late to alter AVC and HEVC, although perhaps a note of warning is necessary.

jpiesing commented 2 years ago

When you have content spec(s) with no player spec then players (and the systems/architectures around them) will be developed based on real-world content and test cases. Parts of the spec that are not (yet) used in the real world & which don't have test cases will not be supported. The design of players, the components around them and the interfaces between them will make assumptions based on what is believed to be supported. Some examples of the assumptions that I believe will have been made include the following;

That a CMAF video track has the same width and height all the way through
That when the manifest includes the width and height for a CMAF video track, this is correct & matches what is in the media data (in MPEG DASH, Representation@width and Representation@height are optional, DVB-DASH makes them mandatory)
That the CMAF player is the only entity controlling video scaling
That the CMAF player can control video scaling based on the width & height in the manifest (where present) and otherwise in the initialisation segment
The system component connected to the output from the video decoder may make assumptions about the resolutions it receives. It may need to be told about resolution changes & not just automagically work everything out based on what it receives from the decoder.
Other examples will exist.

If both the media decoder and the player would need to control video scaling of the stream at the same time then this might have an impact on the interface between them. In some systems, these may come from different entities. Ensuring that switching CMAF video track within a Switching Set would work correctly where the new CMAF video track has a different resolution from what is in the manifest could be complex and require changes to the interfaces between them.

Obviously a CMAF player, a media decoder and the interface between them can be modified to make all of this work assuming there is test content & assuming the cost of the changes are justified by the benefits. Of course architectural changes are likely to be more expensive than simple functional changes within a single component.

jpiesing commented 2 years ago

For VVC, we should be the least restrictive as possible (starting with none). For any parameter, the user ecosystem can further restrict it as needed, even temporarily as DVB has done. But if the intent is to meet the requirements of a common media profile, then we need to vet this broader than just those present this week. And, I think it is too late to alter AVC and HEVC, although perhaps a note of warning is necessary.

If my concerns are reflected in the real world, then someone who wants to support VVC-CMAF but is impacted by those issues has only bad choices if we do as you suggest.

Don't support VVC in CMAF
Support an implementation-specific subset of VVC in CMAF and proactively communicate that subset by non-technical means (probably only practical for larger organisations)
Support an implementation-specific subset of VVC in CMAF, hope nobody uses the features that aren't supported & react if there are real-world problems
Attempt to make the player detect VVC bitstreams using non-supported features by parsing the bitstream and not play streams using those features

yagosf commented 2 years ago

Hi @jpiesing

I am a bit confused about the following statements:

That a CMAF video track has the same width and height all the way through

The system component connected to the output from the video decoder may make assumptions about the resolutions it receives. It may need to be told about resolution changes & not just automagically work everything out based on what it receives from the decoder.

My understanding is that when switching happens with Tracks within a Switching Set following Single initialization CMAF switching set constraints (subclause 9.3.7 in CMAF) there is no difference for a CMAF player for processing such content compared to processing a CMAF track that does not have a constant width a height, since those constraints allows a CMAF player to pick a single CMAF Header and just process segments of different CMAF Tracks. So in such a case (when switching happens with a single CMAF Header), the CMAF player cannot just assume that everything is constant as suggested above, right? Or am I missing something? What information is the CMAF player missing in the single CMAF Track case compared to the Switching Set case mentioned above?

jpiesing commented 2 years ago

My understanding is that when switching happens with Tracks within a Switching Set following Single initialization CMAF switching set constraints (subclause 9.3.7 in CMAF) there is no difference for a CMAF player for processing such content compared to processing a CMAF track that does not have a constant width a height, since those constraints allows a CMAF player to pick a single CMAF Header and just process segments of different CMAF Tracks. So in such a case (when switching happens with a single CMAF Header), the CMAF player cannot just assume that everything is constant as suggested above, right? Or am I missing something? What information is the CMAF player missing in the single CMAF Track case compared to the Switching Set case mentioned above?

I'm not sure I fully understand your point ... The player decides to switch CMAF Track between two tracks with a different resolution at a segment boundary. The player fetches the appropriate data. The player instructs the system component handling scaling to change the scaling at the representation switch. Depending on the architecture, the player may need to instruct the system component that receives the output from the decoder to prepare for change in the resolution it receives.

RufaelDev commented 2 years ago

DVB-DASH has the sentence in 5.1.2 for avc tracks:

For video Representations, the width and height values in the track header box shall have the nominal display size in square pixels after decoding, H.264/AVC cropping, and rescaling.

We have seen practical implementations e.g. VLC player etc support this, and scale based on tkhd/manifest width and heigth and scale based on the pasp (pixel aspect ratio).

With regard to not changing in a segment, DVB-DASH has the statement that all pps/sps should be in the first access unit of a segment for avc3 , this would implicitly imply that changes would only happen at segment boundaries.

We have also used the feature of different underlying resolution to stitch MP4 clips, e.g. a content with different underlying resolution is stiched in a single track/representation with same pasp/width/height, and this worked well in practice on different players even in DASH. In this case even we could use multiple sample entries with different pasp and different encoded resolutions with different width/height to be part of a single track/representation. This worked well on most common players too. We think this feature may not be supported everywhere but it is indeed very similar to switching between different bit-rate representations that may be encoded in a different way and shows the benefits of using MP4/CMAF based container formats.

I think CMAF could consider the DVB recommendation of only allowing switching of underlying resolution at segment boundaries but fully restricting the underlying resolution would not align to what we have seen supported in practice and excludes some of the benefits we get from using mp4.

jpiesing commented 2 years ago

We think this feature may not be supported everywhere but it is indeed very similar to switching between different bit-rate representations that may be encoded in a different way

Sorry but when you look at how a player fits into an overall system, they are not "very similar". Changing resolution involves extra components and interfaces in the system where the player is running. It's no longer just the player passing an fMP4 stream to something which in turn passes it to the decoder.

yagosf commented 2 years ago

Thanks for the clarification. My point was that basically there would not be any difference on the data that is passed to the CMAF player in both cases I was mentioning above. But if I understand correctly what you are saying, the payer will instruct that for a particular segment the resolution has changed; so the CMAF player you are envisioning does not only use the data on the CMAF fragments or CMAF Headers to handle resolution changes but some external data, e.g. from the DASH-MPD?

jpiesing commented 2 years ago

so the CMAF player you are envisioning does not only use the data on the CMAF fragments or CMAF Headers to handle resolution changes but some external data, e.g. from the DASH-MPD?

I expect a DASH player will give preference to information in the MPD over information in the media. If it doesn't have to look at the initialisation segment / media segments then it will probably try to avoid it. I can't comment on an HLS player.

RufaelDev commented 2 years ago

If you check the CMAF profile of DASH in 5th edition, you will see that most values in the manifest can be directly derived from the init and media segments, while this may not always be the case it typically is the case and wether the manifest or init/media segment is used to check metadata by a player should not make a fundamental difference, i.e. the @sar and @par in dash are relative and thus resolution independent, and sar can be derived from pasp in the file format while par can be derived from the tkhd width and height.

yekuiwang commented 2 years ago

Just registered GitHub. Thanks Mike for sending the link of this issue, which triggered me registering. Now Thomas won't repeat "Hmm.... Ye-Kui does not use GitHub" :-)

Do I need to join the group like this MPEGGroup or CMAF subgroup to get notifications of new comments? Or this just works the same way as GitLab issues (meaning once you have a comment or someone @you then you will receive new comments to your email address)?

cconcolato commented 2 years ago

@yekuiwang you're correct, once you participate in an issue you get notifications. You can also monitor projects globally or even issues individually even if you don't participate.

krasimirkolarov commented 2 years ago

From the AdHoc group discussion on March 14, 2022: Suggestion is to address this topic in a potential new structural profile (as proposed by Jon Piesing in this thread). Thomas points out that such potential profile should also address issue #29 and cover also other codecs. We do not make normative changes, and add a note in the current text that resolution change is permitted and it is up to the application to change it needed.

yreznik-brightcove commented 2 years ago

HEVC and AVC, at an SPS (IDR Frame), you can change resolutions. This is likely when you do an IDR. This is generally incorrect. The only safe way to do so on the elementary stream level is to send end-of-stream NAL, and then effectively start a new stream with new SPS, PPS, and all applicable SEIs. Just doing this at random at IDR boundary won't work. It will break HRD, among many other things. VVC has extra tools to enable such switching, and if we are to do something for it in CMAF, I would suggest limiting this functionality to VVC-specific profiles.

krasimirkolarov commented 2 years ago

We added a note to Ed. 3 that resolution changes in CMAF tracks are permitted. If no further input contributions are received on this, we will close it at MPEG141.

yreznik-brightcove commented 2 years ago

I strongly recommend to add footnote explaining that should be used only IF such switching is also supported by the video compression standard utilized to produce elementary streams included in tracks. To my knowledge, only with VVC this can be done safely. With all other codecs one should restart them, including sending end of stream NAL, and then starting a new stream effectively. There is no guarantee that this will look seamless.

Thanks, Yuriy.

On Oct 25, 2022, at 8:22 AM, krasimirkolarov @.***> wrote:

We added a note to Ed. 3 that resolution changes in CMAF tracks are permitted. If no further input contributions are received on this, we will close it at MPEG141.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

haudiobe commented 1 year ago

A NOTE was added in the third edition: "NOTE 3 The width and height field do not restrict the encoding resolution to fixed values. Resolution changes in one CMAF Track can occur, for example using multiple sample entries or using codec-internal functionalities such as adaptive resolution changes, if the codec and media profile do not explicitly prohibit this. If an application would require fixed width and height of the encoded and decoded signal, additional restrictions are expected to be documented."

Proposal is that this is checked codec by codec.

RufaelDev commented 1 year ago

Question to https://github.com/yreznik-brightcove if I may, how does HRD then work when changing resolution when ABR switching as there is no end of stream NAL in this case ? This is more a question to help me understanding this issue.

krasimirkolarov commented 1 year ago

We will close the issue since it has been addressed in Ed. 3

MPEGGroup / CMAF

Resolution Switching in a CMAF Track #35