Resync for fixed offsets

haudiobe commented 4 years ago

Will,

as discussed in the call, some comments inline:

Thomas

From: Law, Will wilaw@akamai.com Sent: Montag, 24. Februar 2020 18:24 To: Thomas Stockhammer tsto@qti.qualcomm.com Cc: iop@dashif.org Subject: Resync - simplified approach

CAUTION: This email originated from outside of the organization. Hi Thomas

Thanks for putting this deck together.

It seems that the implementation of Resync (especially for live services , which coincidentally is where it is most useful) is all about providing information for the player to parse the media object to find a valid entry point. [TS] This is the last resort if none of the other cases hold. Discussed in details in the client implementation.

This is unfortunate because up to this point much work has been done in DASH to allow a player to play a stream without having to parse any media objects.

[TS] This is not true as we need to parse segments anyways for event messages or SEI messages for 608/708. I understand that this is not the best option, but it is one that potentially provides benefits. We tested the case to submit unstructrured byte ranges into MSE source buffers, but did not work.

This is especially useful for MSE/EME players, where the object requests and ABR logic is decoupled from the box and codec parsing. It is also a strength of the LL-HL:S approach that it explicitly describes each resync point, at the expense of playlist polling. [TS] Understood

It would be a useful extension to Resync if some means were made available to find a resync point only by reading a manifest and without having to poll the manifest at a frequency equal to the chunk duration (as is done with LL-HLS).. One solution would be to indicate a fixed time and byte offset for each resync point in a segment. This could look something like this [TS] As said, this is supported by setting dImin and dImax to the same. On the timing, it says max for now, and we may need a flag to fix this being constant. Good comment for MPEG.

The above snippet describes a low latency stream with 4s segment duration, in which each segment contains 8 resync points each spaced 500ms apart. The beauty of this approach is that the player does not need to parse the media segments in order to switch bitrates at a resync point. The obvious caveat is that the encoder needs to make byte-predictable chunk boundaries. The only way to do this would be to encoder with perfect CBR, or else to have tight-capped VBR and then pad-out each chunk. The padding-out sounds inefficient, however on encoder optimized for this may not present too significant a bit-increase compared to the non-resynce’d stream and the simplicity for players may be well worth that increase, especially for ULL streams.

Has though been given within MPEG to such a scheme? If not, are there any supporters of such an approach? [TS] I added this thread to the MPEG LS information.

Cheers Will

TobbeEdgeware commented 4 years ago

I think Will's idea on providing the resync info into the MPD is a good one. However, putting requirements byte positions for sync points at byte positions inside segments seems very hard to both achieve and to agree on. In particular, if we should make content that can be the same for both HLS and DASH. I also think that adding extra resync boxes inside the segments is a lot of work and will take time to get propagated into all encoders/packagers.

Therefore I started thinking about a simple solution, which I drop here:

In my view, a virtue of the LL-HLS approach is that allows for short segments (parts) at the live edge and longer segments later back in history. The need for frequent playlist updates are of course a major drawback. Since DASH already has a way of providing segments without MPD requests, I think it would make sense to extend it to provide a template for individually addressable parts similar to LL-HLS.

For example, we could add SegmentTemplate attributes like part="$Representation/$Number$_$PART.m4s" nrParts="12" resyncParts="0,3,6,9" segmentWindowasParts="3" with the interpretation that only the 3'rd last segments can be fetched as individual parts, and parts 0, 3, 6, 9 are resync points. This structure could then describe the whole session in an analogous way to Will's proposal.

This would allow for serving the same segments and individually addressable parts to both HLS and DASH with the same URLs. There would be two different mechanisms to get segments as they are produced:

1) HTTP chunked-transfer encoding 2) fetching individual parts

An optimal client could then use 2 instead of 1 just when joining a session. The CDN would cache the parts in the same way for DASH as for LL-HLS.

haudiobe commented 4 years ago

Assign this to Live TF for more discussion.

wilaw commented 4 years ago

@TobbeEdgeware 's proposal is interesting and I support it. The duality of converged support against LL-HLS is attractive. It does have a number of consequences however:

Parts in the order of a few frames, as is used with DASH-LL today, would not be practical. We would need to use part durations closer to what LL-HLS is using today, in the 500-1000ms range. I guess this is an expected interval for switchable Resync points, which are the ones we would care about most.
Egress from the encoder to the origin will double, as it is producing full segments and their duplicates as parts/chunks. One solution would be to express the parts as byte ranges in the building segment. It's not clear to me how we could do that with a segmentTemplate , unless we go with a fixed byte-offset, which was the original subject of this issue, but which Torbjorn has indicated would be difficult to achieve in practice. Maybe MPEG have some good ideas here ..

mstattma commented 4 years ago

Couldn't a smart origin/packager parse incoming segments for chunk boundaries with appropriate SAP points, and translate "part indices" to byte offsets? The part related attributes in SegmentTemplate would then be an additional "deep link" addressing option for a segment otherwise reachable via the address conveyed in the media="" template.

That would enable a client to switch from downloading parts at an "intra-segment" resync point, to pulling full segments with the next segment boundary (if the protocol supports this) and avoid duplicating origin ingest.

TobbeEdgeware commented 4 years ago

@mstattma The byte-range information needs to be conveyed to either the client or the intelligent caching server which could do the translation: It seems to me that it would necessitate a lot of extra MPD requests, getting closer to the LL-HLS scheme.

@wilaw I agree that there will be double output from the encoder/packager. The computational load for that should be small since it is the same content sent twice, but the uplink CDN live bandwidth will double, and there will be a storage cost on the CDN for the double storage with the short-lived parts. On the other hand, live low-delay content is a minor part of all content, and its rather massive CDN downlink than uplink which is the bottle-neck. Furthermore, over time, if this part/full segment structure gets more wide-spread, one could optimise the CDN to assemble the full segments from the parts so that the upstream fetch of the full segment is not needed.

In my view, making DASH compatible with LL-HLS by adding structure and information to the MPD is much more likely to succeed than adding new boxes or encoding restrictions beyond what is needed for LL-HLS/fMP4 and traditional CMAF chunks.

Dash-Industry-Forum / MPEG

Resync for fixed offsets #12