Closed RufaelDev closed 3 years ago
wouldn't the flag duration-is-empty
in tfhd fits this ? This would avoid inserting blank samples which would need removal when de-fragmenting the file.
so overall, no I disagree because we think this is not supported well in players, while inserting ttml or VTTEmptyCueBox is always supported. For defragmenting, removing WVTT empty cue should already be supported by any defragmenter, while removing ttml may not be supported but should be straightforward to do if you want to do that, again keeping it in a defragmented file causes no harm either.
The minimalist conformant TTML document is:
<?xml version="1.0" encoding="UTF-8"?>
<tt xml:lang="en" xmls="http://www.w3.org/ns/ttml"/>
ATSC will probably recommend this for sparse tracks.
@mikedo thanks, I would recommend similar in DVB-DASH (this is also in EBU Tech 3381), and I would hope that eventually such recommendation can be included in CMAF eventually, inlcuding how/when to do the paddings
I am affraid this will crash most players, as this duration-is-empty is not what is adopted in MPEG-4 part 30 or in CMAF
I cannot find in CMAF nor in part 30 anything against usage of duration-is-empty flag
I am not sure that such gaps are allowed by CMAF this can be debated regarding the definition of empty sample
Same thing, I cannot find anything in CMAF regarding this
The general problem I have with this approach is that rather than using a tool that is well documented and has no impact on the source content (hence is transparent for packagers), we now insert empty samples which are all format specific, hence make the packager codec specific. We're doing it here for WebVTT and TTML, but in a few years we'll end up with thousands of sparse metadata formats (haptics, annotations, etc ...) that will follow this same approach, each new format requiring a patch of the packager (and likely defragmenter). And that worries me.
we have two issues:
a) with duration-is-empty you don't have media, i.e. gaps which is not allowed in CMAF and many players (cannot handle this) b) at the end when using this to fill a gap would lead to large segments, which is not supported in DASH numbering that allow only 50% deviation and will also hike your max-segment duration and therefore latency.
Both the approach for TTML empty and WebVTT are explicit in MPEG-4 part 30 either VTTEmptyCue or TTML without body, I think the only gap is to describe in CMAF some recommendations for padding to fullfill CMAF track/switchingset/ presentation requirements.
with duration-is-empty you don't have media, i.e. gaps which is not allowed in CMAF and many players (cannot handle this)
I cannot find anything in CMAF stating this is not allowed, maybe I'm missing something. For media players, what kind of behaviors do you observe ? Parsing issues, decoding issues?
at the end when using this to fill a gap would lead to large segments, which is not supported in DASH numbering that allow only 50% deviation and will also hike your max-segment duration and therefore latency
I disagree, the tfdt can still be present in the empty fragment to indicate a "current decode time" although there is nothing to decode. You can insert as many of these empty segments with updated tfdt as required for your segment duration constraints, just like you insert fake empty samples currently.
There are several aspects here:
A design question: If you think in terms of production of subtitle content, it seems awkward for the subtitle generator (upstream) to have to generate empty subtitles because of packaging constraints (downstream). Especially for live. Your subtitle generator would have to have a heartbeat production. I'd like to have others opinion here, e.g. @nigelmegitt. For VoD, you could imagine that it's less of a problem because you can author your documents such that there is always a document covering any point in time, e.g. extending the duration of the previous sample. But that makes codec-agnostic file manipulations impossible.
a specification question. Looking for "gap" in CMAF, you find Annex F (informative) which explains what happens in case of production error. I don't think that covers what we are discussing here. The only normative statement is in 7.3.2.3:
CMAF chunks in a CMAF track shall not overlap or have gaps in decode time
One can interpret that as not missing tfdt
, in the sense that the tfdt of a chunk should equal the tfdt of the previous plus its duration. "duration-is-empty" still means that there is a duration. Also, as Jean indicates, there is no explicit restriction regarding the use of duration-is-empty
.
I would suggest we create example content with duration-is-empty and have people test their implementation and report. We can then decide to restrict it (e.g. to a new structural brand such as cmf3
) or to explicitly allow it or even to recommend it.
If DVB and ATSC and MPEG-4 part-30 and EBU 3381 recommend using VTTEmptyCue and/or TTML without body as empty sample, it would be safer to recommend that in CMAF aswell instead of introducing another approach. This would only increase incompatibility which should not be the goal of CMAF.
duration-is-empty is not referred in part-30 and not in CMAF, having no samples with decode time (regardless of tfdt) implies discontinuitiy or gap. CMAF writes down what is allowed (the rest is not allowed) and this is not part of it.
For content creation it may be the CMAF packagers job to do the padding, or by the live encoder producing DASH/CMAF, not the subtitle generator, so i disagree with your statement on design @cconcolato .
MPEG-4 Part 30 says:
For sections of the track timeline that have no associated subtitles or timed text content, ‘empty’ samples may be used, as defined for each format, or the duration of the preceding sample extended. Samples with a size of zero are not used.
Note the use of "may" not "should".
This would only increase incompatibility which should not be the goal of CMAF.
Of course, CMAF is about improving interoperability. If indeed other SDOs are frozen on a solution, we should let them use it, but does not mean we cannot evolve CMAF into a more efficient solution.
CMAF writes down what is allowed (the rest is not allowed) and this is not part of it.
That's not correct. CMAF puts restrictions on ISOBMFF. When it does not put restriction on something, it does not mention it.
For example, in the same ISOBMFF section 8.8.7.1 where duration-is-empty
is defined, default-sample-duration-present
is defined. It is not mentioned in CMAF. Are you saying it is not allowed?
For content creation it may be the CMAF packagers job to do the padding, or by the live encoder producing DASH/CMAF, not the subtitle generator, so i disagree with your statement on design @cconcolato .
Then the packager is not codec-agnostic, right? It has to be TTML-aware or VTT-aware or at least have a mapping between codec and an 'empty' sample definition. I was just hinting that this design is not scalable.
my point is part-30 does not mention duration-is-empty and CMAF neither, the may is used because in a non-fragmented format which is also supported in MPEG-4 part 30 you do not need this. So yes using VTTEmpty Cue or empty ttml is really optional, but it is currently the only method defined and used to implement the CMAF track model for subtitles.
Sure CMAF could define or evolve to something better, but typically technological advance should be in the technology standards first e.g. MPEG-4 part 30 and only after that be considered in CMAF.
Restricting ISOBMFF in my opinion implies writing what is allowed, it is a matter of wording, so i still believe i am correct, as CMAF restricts both the flags and boxes that can be used and duration-is-empty
Yes packagers are always codec agnostic, that is a fact, just as there are ISOBMFF bindings for AVC/HEVC/VVC/AV1/MPEG-H audio you name it. Each have their own binding to the file format. So I dont really understand your point about this not being scalable.
Thanks for bringing me in here @cconcolato .
In terms of the design question I consider a subtitle encoder to be responsible for generating data that effectively encodes a continuous stream of subtitle presentation, in the same way that an audio encoder generates encoded data that, when decoded, produces a continuous stream of audio samples. Clearly the encoded data are time-division-multiplexed, according to the packaging requirements, as set by whoever is configuring the encoding and packaging chain.
So from that perspective, it is reasonable to expect the subtitle encoder to generate encoded subtitle samples which, when decoded, mean "for the duration of this subtitle sample, present nothing". If I saw a subtitle encoder simply stop producing output for a while, I would think it is broken, not that there are no subtitles to present.
When we implemented the EBU-TT Live Interoperability Toolkit (LIT) we designed the Resequencer component, in its "output a new subtitle document every n seconds" mode, so that it would output documents containing no content for periods when it had received no subtitles. Feeding those documents to the EBU-TT-D encoder then generates empty documents. I mention this because it was my assumption that the subtitle encoder would generate empty documents, at that time.
From a packaging perspective, if the input temporarily disappears, it may not be straightforward to update the manifest to remove the subtitle components and then add them back in again, and the impact on players may not be desirable either. So it probably would make sense for packager implementers and/or operators to make a call on whether they want to supply default "empty" subtitle documents or let the client device get a 404 when fetching the non-existent subtitles. And that in turn might depend on the player's behaviour on getting those 404s.
In terms of the precise format of an empty TTML / IMSC / EBU-TT-D document, this is something where the different profiles of TTML differ slightly in what is permitted, and the encoders I am aware of also differ. EBU-TT-D is the only profile that requires that the body
element contains at least one div
element and the div
element contains at least one p
element, which has the consequence that the only conformant EBU-TT-D document with no subtitles is one without a body
element present at all. Then, additionally, as Cyril points out, EBU Tech3381 also specifies a specific empty document that shall be accepted, which also omits the head
element. I believe that would be IMSC conformant too (but I haven't checked recently). It is:
<tt xml:lang="" xmlns="http://www.w3.org/ns/ttml"/>
Note that unlike @mikedo 's suggestion in https://github.com/MPEGGroup/CMAF/issues/9#issuecomment-649592099 this excludes the XML header, which is not formally required, because it is optional in XML 1.0, which is the basis used for encoding all current versions of TTML, EBU-TT-D and IMSC. (thank you to @tairt for pointing this out to me some time ago!)
One encoder supplier whose EBU-TT-D output I have had the opportunity to review in detail currently creates empty subtitle documents that are not actually conformant EBU-TT-D: they contain an empty div
element instead of omitting the body
. This is IMSC Text Profile conformant though.
When faced with the fact that this is not conformant, naturally, the supplier wanted to know the real world impact on players, and naturally I was unable to provide an all-encompassing answer; it may well be that many players would simply continue without any user impact at all.
body
?Therefore I raised with EBU the possibility that a future version of EBU-TT-D relaxes this constraint, but I would be very interested to know if, from a CMAF perspective, it would be preferable instead to push in the other direction, by proposing that all empty CMAF TTML subtitle segments omit the body
element.
@RufaelDev and I had an offline discussion. Our summary is: The discussion reveals 2 separate aspects relevant for CMAF:
The suggestion would be to put these questions into a Defect Report/Tuc and welcome contributions. Maybe liaise with other SDOs to get feedback.
Maybe a survey of current practices would best be done by an industry forum rather than MPEG? MPEG could proactively document how to best do a sparse timed text track for encoder and player vendors to strive to sooner than later?
@mikedo i agree, my suggestion was to include industry fora CTA, DASH-IF, and SDO DVB, ATSC and maybe EBU I think indeed mpeg should be pro-active to at least gain undertanding how CMAF users would solve this today, and if possible document a best practice.
one other point, it is not only sparse subtitle tracks, it could also be for audio/video tracks padding that we could ask feedback on padding to achieve approximately the same length. We see people running into this problem of tracks that are not of the same length when trying to use CMAF, so I do think the issue is important for CMAF, as it is hampering adoption or making it more painfull than necessary.
Yes, seems like the solution should be general to any kind of track. Although unusual, it could also be used for black video and muted audio padding, even if the coded data is nominally present.
Apologies if I've missed this somewhere and it already exists, but it might be helpful to be able to publish/signal a 'null' segment in the same way as an init segment is signalled now.
init + seg + seg + seg + seg + [null] + seg + [null] + [null] + ...
etc.
Then whatever encoded version of a null segment is appropriate for the media type could be created once and referenced whenever it is needed. For TTML it would be that empty document, for other types it would be some other kind of resource.
Just thinking out loud. Forgive me if this is already covered.
Can't one say in the MPD "nothing happens for this duration, please move along"? Downloading a resource which then explicitly says "fooled you! there's nothing here!" seems silly.
Can't one say in the MPD "nothing happens for this duration, please move along"?
Perhaps you can, if the meaning of "nothing happens" is completely clear for the media type concerned. Unfortunately it is not. A scheme that defines "nothing" explicitly so it can be referenced later would help tidy that up.
In the case of subtitles, say, one presentation style I have seen shows a dark rectangular area where the text would be all the time when the subtitles are enabled, even if there is no text. That area is presumably defined in the subtitle documents. If no text is present for an entire segment, how would you signal to continue showing the dark area?
(disclaimer: BBC doesn't typically use this style)
Perhaps you can, if the meaning of "nothing happens" is completely clear for the media type concerned. Unfortunately it is not. A scheme that defines "nothing" explicitly so it can be referenced later would help tidy that up.
agreed, it has to be defined or obvious for each media type. Sound, well, it's silence. Video, nothing paints, (not even "we regret the loss of picture" (as the BBC used to say when the studio failed). For captions it seems fairly obvious?
Video, nothing paints
Already I can think of at least 3 schemes that would mean "nothing paints" and I don't know which one is right!
For captions it seems fairly obvious?
Does it? I think otherwise, as per https://github.com/MPEGGroup/CMAF/issues/9#issuecomment-654072901
I agree video is the hardest case. In the case of captions, I think it's "as if the captions were not there or not enabled", so no, you don't get the black rectangle.
For video, it no longer obscures what's below. If there is nothing below (it's not an overlay but the bottom document in the rendering stack), we're staring into the void, it's an application-specific fill (like "we regret the loss of picture").
just a few points to consider for the live/low latency streaming cases:
These are some things to take into account for the case of live (low latency streaming). But also in VoD were you use segment index, you will need to fill gaps/missing/empty segments content to make your index segment work on byte ranges and time ranges, so such segments are also applicable there.
Note that CMAF tracks padding can be done already and we are ok in practice, but there is a risk that people pad differently, so the question was if some explicit recommendation is needed. Also my thinking was that it would help adoption of the spec if this was a bit more clear as tracks of unequal length give problems. As for timed text/subtitle the problem occurs most frequently, that was the main case. If not done in CMAF itself this issue might be better discussed and processed in an industry forum. My intention with this issue was not to be introducing new client/player behaviour, but only to recommend a best practice with CMAF as is.
gaps/missing segments in DASH/CMAF can exist but the typical behaviour is to skip all A/V representations, while this is not the intended behaviour for subtitles
@RufaelDev are you sure it's not the intended behaviour? Just wondering if this is documented anywhere: it seems weird to have predefined levels of importance for different types of representation, rather than making it content or application specific.
In such a case duration-is-empty would not allow finding the next time as the duration is missing, hence the timeline extension without mpd update would not work for example.
I don't understand.
duration-is-empty
: this indicates that the duration provided in either default-sample-duration, or by the default-sample-duration in the TrackExtendsBox, is empty, i.e. that there are no samples for this time interval.
So you get a duration. I am not sure honestly that this flag helps much; the two useful cases are that the MPD tells you not to bother to fetch (saves a fetch); or that the file fetched tells you exactly what to do (e.g. paint a caption region with no text, as Nigels suggests). Once you've fetched something, you may as well be clear.
I understand that if you're using algorithmic segment-URL generation, you always need a segment, and so the MPD telling you that there is nothing there is not possible, as you're not fetching new MPDs.
gaps/missing segments in DASH/CMAF can exist but the typical behaviour is to skip all A/V representations, while this is not the intended behaviour for subtitles
@RufaelDev are you sure it's not the intended behaviour? Just wondering if this is documented anywhere: it seems weird to have predefined levels of importance for different types of representation, rather than making it content or application specific.
An example is end of sect. 6.6.8 of CMAF, by skipping I meant A/V/T (not only A/V note this is also in the CMAF spec text), sorry for the misunderstanding . A player may skip all A/V/T for a part that has a discontinuitity (e.g. in DASH a new period may be used).
my point is that for sparse subtitles all such behavior intended for gaps/discontinuities seems rather undesirable.
In such a case duration-is-empty would not allow finding the next time as the duration is missing, hence the timeline extension without mpd update would not work for example.
I don't understand.
duration-is-empty: this indicates that the duration provided in either default-sample-duration, or by the default-sample-duration in the TrackExtendsBox, is empty, i.e. that there are no samples for this time interval.
So you get a duration. I am not sure honestly that this flag helps much; the two useful cases are that the MPD tells you not to bother to fetch (saves a fetch); or that the file fetched tells you exactly what to do (e.g. paint a caption region with no text, as Nigels suggests). Once you've fetched something, you may as well be clear. I understand that if you're using algorithmic segment-URL generation, you always need a segment, and so the MPD telling you that there is nothing there is not possible, as you're not fetching new MPDs.
Yes indeed and for this functionality one needs the fragment duration, not the (default) sample duration or zero (given there are no samples one would not know how to calculate the fragment duration).
In low latency the mpd is not always updated (e.g. numbering or time extension in DASH) so it cannot always tell what (not) to fetch, and yes in the ideal world the segment would tell me exactly what to do :-) , that is why i think why a segment with VTTEmptyCue in samples and ttml without body in samples may be more helpful than a segment with a duration-is-empty flag as I know what to do with that information in the first case, that is render no subtitles for the duration of the fragment, while for the second i am still not sure.
Last regarding comment #9, there is not a well established way to say in the MPD "nothing happens for this duration" for a representation or adaptationset.
m55342 http://wg11.sc29.org/doc_end_user/documents/132_OnLine/wg11/m55342-v3-m55342_v2.zip studies and highlights some of the text around gaps/continuity and handling that.
This issue is related to MPEG internal issue http://mpegx.int-evry.fr/software/MPEG/Systems/ApplicationFormat/CMAF/-/issues/30
The group discussed this issue as part of the discussion on contribution m55778 and decided to close this issue.
CMAF presentations composed of audio, video and timed text should have tracks defining them, and tracks are composed of CMAF fragments or segments.
In some practical cases, subtitles or timed text are not available (e.g. at the end of a presentations), to comply with the CMAF presentation model, it would be nice if CMAF could include a recommendation for using empty subtitle timed text samples that have a timespan but do not contain text. This is supported in MPEG-4 part 30 but it is not explicit or required in CMAF. My recommendation is to define a default method for padding fragments when no timed text or subt. is available. This way it will be more explicit for media presentations with timed text or subtitles to comply to the CMAF presentation model.
My recommendation would be to recommend fragments with a sample carrying VTTEmptyCueBox or a sample containing valid TTML document.
It would be great if section 11 could make a suggestion of how CMAF tracks with partially no subtitle can be supported by padding and fragments, perhaps with an example.
Again, I think the padding can be done in different ways, but making this an explicit recommendation would be helpful. Too many times we see a subtitle track that is much shorter than the audio video or has a gap.