Low latency: DASH target latency and HLS PART-HOLD-BACK are not equivalent

In section 5.2.2.2, the current text states that HLS PART-HOLD-BACK can be set to the value of the (DASH) Latency@target attribute of the ServiceDescription as seconds if present and vice versa. However, these are not equivalent.

In DASH, the Latency@target is specified as a delay relative to the Producer Reference Time. The Producer Reference Time is an arbitrary reference wall clock time chosen by the content provider and will not be the same as the availability time of CMAF chunks on the server. Also, in DASH, the latency target is just a target and the client is not expected to use it blindly. There are min and max values too which guide client behaviour.

By contrast, HLS PART-HOLD-BACK is essentially a delay relative to the appearance of chunks in the media playlist and doesn't have corresponding min and max values.

Whilst it would be possible to write an equation to derive a "chunk availability to presentation target" latency figure from a DASH MPD, it's (a) quite complex, involving use of ProducerReferenceTime@wallClockTime, ProducerReferenceTime@presentationTime, presentationTimeOffset, timescale, PeriodStart, availabilityStartTime and availabilityTimeOffset, and (b) that still doesn't give you a value you can definitely use as an HLS PART-HOLD-BACK parameter because the conversion would also be affected by the time taken to update HLS manifests with new chunks, and might also need to be informed by knowledge of the capabilities and behaviour of HLS clients, which is almost certainly different to the behaviour of DASH clients.

The best resolution to this issue is probably to replace the current two bullet points that say that these DASH and HLS concepts are interchangeable (one DASH->HLS one and one HLS->DASH one) with text that says that they're not and highlight the differences. Then it unfortunately falls to users of the spec to determine appropriate DASH latency targets and HLS PART-HOLD-BACK values.

To provide complete guidance on how to set these values would likely be too much work to add to this version of the document.

@poolec - the Producer Reference Time in DASH is not arbitrary. It has a @Type attribute which defines where in the workflow timelines it is intended to reference:

'encoder' [default] provides a reference when the media time was input to an encoder following the exact definition in ISO/IEC 14496-12, subclause 8.16.5 for flags set to 0.
'captured' provides a reference when the media time was captured following the exact definition in ISO/IEC 14496-12, subclause 8.16.5 for both flag 8 and flag 16 being set.
'application' provides a reference of the media time related to wall-clock time based on an application defined relation. In this case following subclause 8.16.5 of ISO/IEC 14496-12 flag 16 shall be set and flag 8 shall be unset.

That being said, I actually agree with you that there is not an easy correspondence between PART-HOLD-BACK, which excludes any encode or packaging time, and latency@target, which includes it.

We could however make a general suggestion which might prove more useful than omitting any guidance:

If you have a LL-DASH manifest and you are creating an LL-HLS playlist, we just decided in the last meeting that you can't really do this as you have no visibility in to the PART duration, which is critical in defining the LL-HLS playlist.
If you have a LL-HLS playlist and are creating a LL-DASH manifest, set a latency@target value which is > than the PART-HOLD-BACK. How much greater? Well, the encode time typically does not exceed segment duration. Therefore the latency@target value could be set somewhere in range from PART-HOLD-BACK to PART-HOLD-BACK + DURATION.

Hi Will,

Yes, Producer Reference Time does have defined semantics and always conveys a wall-clock time, but only two of the three types have a precise meaning: the application one just "provides a reference of the media time related to wall-clock time based on an application defined relation". Also, whilst there might be upper bounds on encoding delay, the bounds on delay from capture might be much larger due to contribution links.

I think in converting from LL-DASH to LL-HLS, it would be easier to determine an appropriate PART-HOLD-BACK value independently rather than by trying to derive a value from the DASH ServiceDescription and ProducerReferenceTime signalling.

In the other direction, it may be slightly easier to automate because you can generate both the ProducerReferenceTime and the ServiceDescription latency target. You could perhaps set an 'application' ProducerReferenceTime that marks the time a chunk appears in the HLS manifest and then set a latency target relative to that. We may not want to recommend that approach, however.

I think your general point is a good one that a good conversion from one to the other is likely to require input of some additional information.

Further discussion occurred in an email thread, the net result is that we are taking away the direct conversion statements for PART-HOLD-BACK and Latency@target. Instead we suggest they are set based on a low latency target if known, otherwise omitted and provided a detailed note on the cross-bounding the values may mean to each other.

DASH to HLS text:

The PART-HOLD-BACK attribute is set based on the service desired low latency target, if known, otherwise the attribute is omitted. Note: The value of the Latency@target attribute of the ServiceDescription element can be considered an upper bound for the PART-HOLD-BACK attribute, but it may be larger than the intended latency. This difference is due to the PART-HOLD-BACK attribute being relative to the end of the Media Playlist and the Latency@target attribute being relative to the wall clock time associated with the presentation.

HLS to DASH text:

If known, the desired low latency target is used to: o Generate a ServiceDescription element with a Latency element where Latency@target is set to the desired target converted to milliseconds. o Generate a ProducerReferenceTime element with @type, @wallClockTime, and @presentationTime attributes set to capture the known relation of stream time to wall clock time. Note: The value of the PART-HOLD-BACK attribute of the EXT-X-SERVER-CONTROL tag can be considered a lower bound for the Latency@target attribute, but it may be smaller than the intended latency. This difference is due to the PART-HOLD-BACK attribute being relative to the end of the Media Playlist and the Latency@target attribute being relative to the wall clock time associated with the presentation.

Further commentary on an email sidebar:

" ...trying to bring a full correction and conversion semantics is too much to take on right now. [This] is also the same "problem we ran into with the actual low latency description and pivoted to highlight that the conversion cannot "happen. It makes sense to do the same here... " ............................................................................................................
"...[but] we need to propose some actual words...something to the effect of "- DASH ServiceDescription latency target and HLS PART-HOLD-BACK both inform the " client about the intended presentation latency but they are not equivalent concepts " because they have different reference points and semantics " "- Conversion between the two is not straightforward and requires additional knowledge "that the MPD and/or Playlist does not provide "- It is recommended that when converting from DASH to HLS or vice versa, that these "parameters are independently set based on the desired presentation delay for the stream "and the expected capabilities of the target client devices" " "That's not particularly satisfactory as it's kind of dodging the issue but I think it might be as "much as we could say in this version." ............................................................................................................ "One question - could we provide slightly more information by speaking about bounds? Could we say that "if converting from HLS to DASH, that the DASH latency target is bounded on the lower side by "the HLS PART-HOLD-BACK? If the HLS clients are holding back 3 parts each of 500ms, then the DASH "latency has to be at least 1500ms or higher?" " "Converting from DASH to HLS could we say that the PART-HOLD-BACK is bounded on the upper side by "the signaled DASH latency?" ............................................................................................................ [The following was also sent to WAVE participant members, included here for continuity] Changes implemented in sections 5.2.2.2 and 5.2.2.3 • Remove explicit PART-HOLD-BACK <=> Latency@target conversion • Add text on inclusion of manifest signals if low latency target is known to the conversion • Add note on the upper/lower bounding effects the attributes have on each other and their time reference difference that makes them not exactly equal • For 5.2.2.3 only: Add a note on generating a Producer Reference Time element since that was omitted previously but is critical to the conversion

I have two questions about the ntp_timestamp semantics.

8.16.5.3 Semantics [...] ntp_timestamp indicates a UTC time in NTP format associated to media_time as follows:

if flags is set to 0, the UTC time is the time at which the frame belonging to the reference track in the following movie fragment and whose presentation time is media_time was input to the encoder.

if flags is set to 1, the UTC time is the time at which the frame belonging to the reference track in the following movie fragment and whose presentation time is media_time was output from the encoder.

if flags is set to 2, the UTC time is the time at which the following MovieFragmentBox was finalized. media_time is set to the presentation of the earliest frame of the reference track in presentation order of the movie fragment.

if flags is set to 4, the UTC time is the time at which the following MovieFragmentBox was written to file. media_time is set to the presentation of the earliest frame of the reference track in presentation order of the movie fragment.

if flags is set to 8, the association between the media_time and UTC time is arbitrary but consistent between multiple occurrences of this box in the same track

if flags is set to 24 (i.e. the two bits corresponding to value 8 and 16 are set), the UTC time has a consistent, small (ideally zero), offset from the real-time of the experience depicted in the media at media_time

(From ISO/IEC 14496-12:2020)

The 'application' sense (i.e.: 16 set, 8 unset) is not mentioned. Am I look at an old version of the spec. or is this an obvious corollary from the last two bullets? (If so, please help me understand :thinking:)
What would be the correct flag to signal the (ingest) time of the origin? I'm assuming that the encoder (flags=0) would typically either indicate the presentation time of receiving the first frame or sending the chunk to an origin (presenation time of last frame). The origin would receive the chunk and immediately persist it to storage. Inserting a prft with flags=4 seems to make sense. Or should this be flags=16?

Assuming synchronized UTC wallclocks, the delta between encoder and origin prft would reflect network latency (plus at most one chunk duration). Note that of course the chunk received and persisted by the origin will not be available immediately to the player because the origin may need to wait till the content is available for all tracks before advertising it in the client manifest. Would this additional latency also be included by either PART-HOLD_BACK or Latency@target?

cta-wave / dash-hls

Low latency: DASH target latency and HLS PART-HOLD-BACK are not equivalent #30