MPEGGroup / FileFormat

MPEG file format discussions
20 stars 0 forks source link

Use of edit lists and timed text tracks #40

Open cconcolato opened 2 years ago

cconcolato commented 2 years ago

I want to explore the use of edit lists of TTML in MP4 (and WebVTT in MP4).

Reminder on edit lists

As defined in 14496-12, a track has:

The edit list of a track can provide a very complex transformation of the composition timeline. For example, it can indicate that for a presentation time interval, no samples are played (empty edit); or that for a presentation time interval, a sample is paused; or that for a presentation time interval, a composition time interval is played at a given speed ... Edit lists can even select parts of a sample to be in the presentation timeline, e.g. a sample that starts at composition time 5 and ends at composition time 7 is only played from its 6s to 6.5s.

Simple (typical) use cases of edit lists are:

Advanced use cases for edit lists are editing operations in non-linear authoring tools (cut, insert, reorder, ...).

CMAF restricts the use of edit lists to the typical cases above.

Timed text and edit lists

As an example, if a timed text file is authored against a video file and later on that video file is modified to add a bumper (i.e. an ident, an intro, a title card, ...), effectively shifting the dialogs in the audio and the video forward in time, it could be interesting to adjust the timing of the timed text track. This may be done by inserting an empty edit with an edit list.

WebVTT

For WebVTT tracks, given that the WebVTT cue time is derived from the sample time, shifting the sample time with an edit list actually affects the cue time as expected. The following cue:

00:11.000 --> 00:13.000
We are in New York City

gets stored in a sample with CTS = 11s, duration 2s, with the payload:

We are in New York City

if the sample CTS is changed to 15s (adding a 4s empty edit), that effectively does as if the initial cue had been:

00:15.000 --> 00:17.000
We are in New York City

Note that additional care (i.e. use of the ctim box) needs to be taken if the cue payload contained a WebVTT Cue Timestamp.

TTML

For a TTML track, times in the TTML document as relative to the start of the track, but a question is: is it the start of the composition timeline or the start of the presentation timeline? Let's assume that the following document:

<tt
...
<body>
<div>
<p begin="11s" end="13s">We are in New York City</p>
</div>
</body>
</tt>

is stored in a sample (CTS = 11s, duration = 2s).

i. if TTML document times are interpreted as delta from the start of the composition timeline, the behavior would be the same as for the WebVTT case. When applying the same edit list, and playing presentation time 15s, the player would know that it is actually playing composition time 11s, and which would match the time values in the document.

ii. if TTML document times are interpreted as delta from the start of the presentation timeline, when applying the same edit list, at presentation time 15s, when the TTML parser is fed the same document and seeked at time 15s, there is no active element. Nothing plays. To make it work, when adding the edit list, one has to adjust the TTML document in the sample to be:

...
<body>
<div>
<p begin="15s" end="17s">We are in New York City</p>
</div>
</body>
</tt>

Currrent spec text

ISO/IEC 14496-30 2nd edition, Section 4.2 says:

The rendering of the sample happens at the composition time, taking into account edit lists if any

This means, as usual, that the presentation of a timed text track behaves like a video or audio track and is driven by the presentation time, from which a composition time and a sample number is derived.

It then says:

The subclauses defining the storage of specific formats in the ISOBMFF specify how internal timing values relate to the track time or to the sample decode or composition time (see subclauses 5.3 and 6.3). For instance, start or end times may be relative to the start of the sample, or the start of the track.

Section 5.3 (TTML) says:

The top-level internal timing values in the timed text samples based on TTML express times on the track presentation timeline – that is, the track media time as optionally modified by the edit list. For example, the begin and end attributes of the element, if used are relative to the start of the track, not relative to the start of the sample.

So clearly edit lists are meant to apply to TTML, but nothing warns about the issue described above. Note that the text from the second edition has other flaws/ambiguities and is rephrased in the amendment 1 to the second edition.

Recommendation

My recommendation would then be to update the TTML section and add something along the lines of:

Edit lists on TTML tracks should be used with care due to the fact that times in the TTML document in a sample are not relative to the sample time. Authoring tools adding edit lists to TTML tracks are expected to update the times in the TTML documents in the track.

nigelmegitt commented 2 years ago

I've had a think and a discussion about this and would add the following points:

Overall, the simplest option for typical use (e.g. for a use case like CMAF) is probably to state that edit lists cannot be used with TTML, so that the composition timeline and the presentation timeline are effectively identical.

I agree with what I think is the intent behind your proposal @cconcolato , which is to define more clearly what it means to apply an edit list to a TTML sample, and what the consequences might be, without changing the current sense that they can be applied. Then it is down to profiles and applications to define whether edit lists are permitted in a particular context.

sdp198 commented 2 years ago

@cconcolato You give two possibilities about TTML timing i) TTML times correlate to Composition Times, and ii) TTML times correlate to Presentation Times.

I think your recommentation is OK, but needs to explicitly explain that it's option ii) that is the correct interpretation.

Previously (before all these discussions around this amendment) I'd expected it was option i), but with the drawback that most clients were unlikely to support edit lists at all with TTML.

cconcolato commented 2 weeks ago

I think we should start an amendment to 14496-30 with this item.

jpiesing commented 1 week ago

If this is taken forwards then I recommend (request) that test content be produced (or at least extremely detailed instructions written for it) that distinguishes between 1) a correct implementation of what is specified, 2) implementations not supporting edit lists on TTML at all, 3) the most obvious way or ways in which an implementation might support edit lists with TTML but incompatibly with what is specified. A description of what would be seen in each case would also be really helpful. Obviously some test content that behaves visibly differently in each of these may need to be very artificial.