Open poolec opened 3 years ago
@poolec - the Producer Reference Time in DASH is not arbitrary. It has a @Type attribute which defines where in the workflow timelines it is intended to reference:
That being said, I actually agree with you that there is not an easy correspondence between PART-HOLD-BACK, which excludes any encode or packaging time, and latency@target, which includes it.
We could however make a general suggestion which might prove more useful than omitting any guidance:
Hi Will,
Yes, Producer Reference Time does have defined semantics and always conveys a wall-clock time, but only two of the three types have a precise meaning: the application one just "provides a reference of the media time related to wall-clock time based on an application defined relation". Also, whilst there might be upper bounds on encoding delay, the bounds on delay from capture might be much larger due to contribution links.
I think in converting from LL-DASH to LL-HLS, it would be easier to determine an appropriate PART-HOLD-BACK value independently rather than by trying to derive a value from the DASH ServiceDescription and ProducerReferenceTime signalling.
In the other direction, it may be slightly easier to automate because you can generate both the ProducerReferenceTime and the ServiceDescription latency target. You could perhaps set an 'application' ProducerReferenceTime that marks the time a chunk appears in the HLS manifest and then set a latency target relative to that. We may not want to recommend that approach, however.
I think your general point is a good one that a good conversion from one to the other is likely to require input of some additional information.
Further discussion occurred in an email thread, the net result is that we are taking away the direct conversion statements for PART-HOLD-BACK
and Latency@target
. Instead we suggest they are set based on a low latency target if known, otherwise omitted and provided a detailed note on the cross-bounding the values may mean to each other.
DASH to HLS text:
The
PART-HOLD-BACK
attribute is set based on the service desired low latency target, if known, otherwise the attribute is omitted. Note: The value of theLatency@target
attribute of theServiceDescription
element can be considered an upper bound for thePART-HOLD-BACK
attribute, but it may be larger than the intended latency. This difference is due to thePART-HOLD-BACK
attribute being relative to the end of the Media Playlist and theLatency@target
attribute being relative to the wall clock time associated with the presentation.
HLS to DASH text:
- If known, the desired low latency target is used to: o Generate a
ServiceDescription
element with aLatency
element whereLatency@target
is set to the desired target converted to milliseconds. o Generate aProducerReferenceTime
element with@type
,@wallClockTime
, and@presentationTime
attributes set to capture the known relation of stream time to wall clock time. Note: The value of thePART-HOLD-BACK
attribute of theEXT-X-SERVER-CONTROL
tag can be considered a lower bound for theLatency@target
attribute, but it may be smaller than the intended latency. This difference is due to thePART-HOLD-BACK
attribute being relative to the end of the Media Playlist and theLatency@target
attribute being relative to the wall clock time associated with the presentation.
Further commentary on an email sidebar:
" ...trying to bring a full correction and conversion semantics is too much to take on right now. [This] is also the same
"problem we ran into with the actual low latency description and pivoted to highlight that the conversion cannot
"happen. It makes sense to do the same here... "
............................................................................................................
"...[but] we need to propose some actual words...something to the effect of
"- DASH ServiceDescription latency target and HLS PART-HOLD-BACK both inform the
" client about the intended presentation latency but they are not equivalent concepts
" because they have different reference points and semantics
"
"- Conversion between the two is not straightforward and requires additional knowledge
"that the MPD and/or Playlist does not provide
"- It is recommended that when converting from DASH to HLS or vice versa, that these
"parameters are independently set based on the desired presentation delay for the stream
"and the expected capabilities of the target client devices"
"
"That's not particularly satisfactory as it's kind of dodging the issue but I think it might be as
"much as we could say in this version."
............................................................................................................
"One question - could we provide slightly more information by speaking about bounds? Could we say that
"if converting from HLS to DASH, that the DASH latency target is bounded on the lower side by
"the HLS PART-HOLD-BACK? If the HLS clients are holding back 3 parts each of 500ms, then the DASH
"latency has to be at least 1500ms or higher?"
"
"Converting from DASH to HLS could we say that the PART-HOLD-BACK is bounded on the upper side by
"the signaled DASH latency?"
............................................................................................................
[The following was also sent to WAVE participant members, included here for continuity]
Changes implemented in sections 5.2.2.2 and 5.2.2.3
• Remove explicit PART-HOLD-BACK <=> Latency@target conversion
• Add text on inclusion of manifest signals if low latency target is known to the conversion
• Add note on the upper/lower bounding effects the attributes have on each other and their time reference difference that makes them not exactly equal
• For 5.2.2.3 only: Add a note on generating a Producer Reference Time element since that was omitted previously but is critical to the conversion
I have two questions about the ntp_timestamp
semantics.
8.16.5.3 Semantics [...]
ntp_timestamp
indicates a UTC time in NTP format associated tomedia_time
as follows:
- if
flags
is set to 0, the UTC time is the time at which the frame belonging to the reference track in the following movie fragment and whose presentation time ismedia_time
was input to the encoder.- if
flags
is set to 1, the UTC time is the time at which the frame belonging to the reference track in the following movie fragment and whose presentation time ismedia_time
was output from the encoder.- if
flags
is set to 2, the UTC time is the time at which the followingMovieFragmentBox
was finalized.media_time
is set to the presentation of the earliest frame of the reference track in presentation order of the movie fragment.- if
flags
is set to 4, the UTC time is the time at which the followingMovieFragmentBox
was written to file.media_time
is set to the presentation of the earliest frame of the reference track in presentation order of the movie fragment.- if
flags
is set to 8, the association between themedia_time
and UTC time is arbitrary but consistent between multiple occurrences of this box in the same track- if
flags
is set to 24 (i.e. the two bits corresponding to value 8 and 16 are set), the UTC time has a consistent, small (ideally zero), offset from the real-time of the experience depicted in the media atmedia_time
(From ISO/IEC 14496-12:2020)
flags=0
) would typically either indicate the presentation time of receiving the first frame or sending the chunk to an origin (presenation time of last frame). The origin would receive the chunk and immediately persist it to storage. Inserting a prft
with flags=4
seems to make sense. Or should this be flags=16
?Assuming synchronized UTC wallclocks, the delta between encoder and origin prft
would reflect network latency (plus at most one chunk duration).
Note that of course the chunk received and persisted by the origin will not be available immediately to the player because the origin may need to wait till the content is available for all tracks before advertising it in the client manifest. Would this additional latency also be included by either PART-HOLD_BACK
or Latency@target
?
In section 5.2.2.2, the current text states that HLS PART-HOLD-BACK can be set to the value of the (DASH) Latency@target attribute of the ServiceDescription as seconds if present and vice versa. However, these are not equivalent.
In DASH, the Latency@target is specified as a delay relative to the Producer Reference Time. The Producer Reference Time is an arbitrary reference wall clock time chosen by the content provider and will not be the same as the availability time of CMAF chunks on the server. Also, in DASH, the latency target is just a target and the client is not expected to use it blindly. There are min and max values too which guide client behaviour.
By contrast, HLS PART-HOLD-BACK is essentially a delay relative to the appearance of chunks in the media playlist and doesn't have corresponding min and max values.
Whilst it would be possible to write an equation to derive a "chunk availability to presentation target" latency figure from a DASH MPD, it's (a) quite complex, involving use of ProducerReferenceTime@wallClockTime, ProducerReferenceTime@presentationTime, presentationTimeOffset, timescale, PeriodStart, availabilityStartTime and availabilityTimeOffset, and (b) that still doesn't give you a value you can definitely use as an HLS PART-HOLD-BACK parameter because the conversion would also be affected by the time taken to update HLS manifests with new chunks, and might also need to be informed by knowledge of the capabilities and behaviour of HLS clients, which is almost certainly different to the behaviour of DASH clients.
The best resolution to this issue is probably to replace the current two bullet points that say that these DASH and HLS concepts are interchangeable (one DASH->HLS one and one HLS->DASH one) with text that says that they're not and highlight the differences. Then it unfortunately falls to users of the spec to determine appropriate DASH latency targets and HLS PART-HOLD-BACK values.
To provide complete guidance on how to set these values would likely be too much work to add to this version of the document.