Video and audio segment end times (MLB)

Submitter: Scott Labrozzi Email: scott.labrozzi@mlb.com A question on recommendations:

Video and audio segment end times will not be the same (i.e. a/v AUs are essentially different durations) thus the last segments of audio and video in period[i] will not end at the same time. Assuming period-continuos, are there any recommendations around period[i+1]’s @start, the use of PTO and, for a SegmentTimeline, the @t value for the first segment of media in period[i+1]. For example some options could be:

1) Set the @start of period[i+1] to the end time of the media with the largest end time from period[i]. This will abut the end time of one media set from period[i] to the start of period[i+1] and require a PTO in period[i+1] for the other media. The first segment @t value for all media in period[i+1] can be 0.

2) Set the @start of period[i+1] to the end time of the media with the earliest end time from period[i]. This will abut the end time of one media set from period[i] to the start of period[i+1] and require a > 0 @t value for the first segment for all other media sets.

3) Round up the @start of period[i+1] to say an integer second from the end time of the media with the largest end time from period[i]. No media will abut from their end times in period[i] to the start of period[i+1] and thus a require a PTO in period[i+1] for all media sets.

Furthermore, in the case of non-period-continuous, where for example one is splicing in, adding to the MPD, a VOD ad for period[i+1]. It is very likely that the audio and video for this VOD ad start at time 0 relative to one another. In other words, the timestamp of the first video and first audio are the same. We now have the case where audio and video from period[i] do not end at the same time but audio and video from period[i+1] do start at the same time.

4) Is there a recommendation (or requirement) in terms of how this is handled?

4.1) For example, I could imagine using approach 1) above BUT where the PTO is 0 for all media in period[i+1] is 0. This would leave a small timeline gap for media that does not abut to period[i+1].

4.1.1) in this case is there sufficient language to make it understood at the client there can be timeline gaps between the end of one period and the start of some media of the next period such that the client needs to do XYZ where XYZ is say fill with silence, repeat last video frame?

4.2) I don’t recall what / if any support is allowed regarding overlapping segments across periods. If overlapping media segments were possible, then one could use option 2) above except that the @t for all media in period[i+1] would be 0. This would mean though, as stated, that some media segments (ones not abutting to the start of period[i+1]) would spill / straddle from period[i] to period[i+1] and overlap in time the first media segments of some media in period[i+1].

My recommendation is that DASH IF restrict video Periods to start exactly on a Segment boundary. I hope the specs are already clear on this, but haven't checked lately.

It is impractical for interoperability to require all decoders to silently and instantly decode some portion of a Coded Video Sequence to present an intermediate picture on time. Some decoders can start presentation mid-Segment on the first Period, but not subsequent Periods where synchronization to presentation time is lost and Segments in a buffer are simply concatenated, not overlap spliced.

Periods duration or the start time of the following Period truncates Segments with @timescale accuracy. I recommend that Representations extend at least to the end of the Period for decoder compatibility, which will usually result in truncation of an audio Segment, and possibly a video Segment. Decoders can stop on an arbitrary sample pretty reliably; starting on an arbitrary sample is easy for audio, but hard for video.

If the first video Segment is aligned to Period start time, then audio Segments often won't be, as you say. For best decoder compatibility, I recommend including the audio Segment that overlaps the Period start time (determined by previous Period@duration or this Period@start). The @PTO will determine the first waveform sample that should be presented. Experience with real decoders in browsers has shown that an audio gap often results in players stopping (in the range of a few hundred ms to less than 2 seconds).

Players can respond to @PTO within an audio Segment at the start of a Period with different degrees of accuracy.

The most accurate will seek to the audio frame (sample) that overlaps Period start, silently decode it, and fade up audio at the Period start time and corresponding track decode time (note: square waves must be avoided, but most decoders handle this). Predictive codecs, such as AAC, have an output delay, so earlier data must be decoded anyway to enable output of the intended audio at the right time (called "priming"). Faster than realtime audio decoding is generally much easier than video decoding.
Sample accurate synchronization is also acceptable (usually around 20ms), in which case the player can begin decoding the first complete audio sample following the decode time coincident with Period start time (taking priming into account for presentation synchronization).
Segment accurate synchronization is not acceptable. Starting on the first complete audio Segment could result in a gap of several seconds, which can result in decoding failure or unacceptable user experience.

In traditional terminology, DASH IF should require "splice conditioning" of video, but not audio. A simple example is continuous Periods, where each video Adaptation Set must have their first Segments aligned to the Period start time, but audio Segment alignment is unconstrained. Each Period should contain audio Segments that overlap Period start time and end time. SegmentTimeline can indicate that explicitly with the @t earliest presentation time of the first Segment in the list, which would be less that @PTO. $Number$ time calculation of the first overlapping Segment using average Segment@duration may not be accurate, so a start number should be explicitly stated in the Segment URL template if SegmentTimeline isn't used.

Splice conditioning is typically required for ad insertion and live content, where the start of an avail may not be the start of a Segment, and the end of an ad pod may not match the end of a live Segment. In broadcast and on live servers, video sequences can be dynamically re-encoded to start Coded Video Sequences and Segments at the in/out points (splice conditioning).

Note that multiplexed M2TS is optimized for splicing, and uses a discontinuity flag that resets the presentation time in a way that is roughly analogous to a Period in DASH. The new DTS and PTS are "jammed" based on the assumption that the next frame in the video stream follows the previous sample without a presentation time gap, so presentation time is set to the new PTS to remain continuous. The timestamp gap is ignored and the presentation duration changed to make sample presentation continuous.

That is not how ISO Media and DASH work. ISOBMFF and DASH place media Segments on either a movie timeline or an MPD timeline, and presentation time and duration is fixed. Edit lists and @PTO can move samples around on the composition/presentation timeline. Media must be placed on the presentation timeline explicitly, not derived from the sequence of samples "first come, first served" like M2TS. This is what enables "late binding" of independent ISOM media tracks into a synchronized presentation. For instance, if missing video samples in an ISOBMFF file weren't replaced with sample durations to maintain the media timeline, but instead samples were just "jammed" like M2TS, subsequent video and audio in independent tracks would be out of sync. This makes splice conditioning harder in ISOBMFF because it may affect Segment packaging, Period duration, etc.

Client-side ad insertion in a single video element is problematic because encoding and Segmentation of the live stream can't be splice conditioned for each player and the number of ad frames they will insert. The best approximation is to match avail and ad durations as closely as possible, and return to the first live Segment that overlaps the end of the ad Period, truncating the ad Period as necessary. This is contrary to DASH syntax if the ad Period has @duration predetermined, and easier of ad avails are signaled through 'emsg' using SCTE-35 or VAST, which does not change the live presentation timeline tied to the live stream.

A separate video element allows ads to be downloaded as simple files or a different media format, displayed over, around, or in the live video, and the live program to be resumed on any picture and Segment without splice conditioning or clipping of ads to the nearest live Segment, without previously knowing the duration of the break (e.g. a sports timeout started and stopped by "the big red button").

Dash-Industry-Forum / DASH-IF-IOP

Video and audio segment end times (MLB) #78