Closed haudiobe closed 10 years ago
Input from Kevin Streeter:
My understanding is that minBufferTime has nothing to do with network behavior, or the amount of playback buffer that a client should allocate to get smooth playback under a given network condition. Instead, the value describes how much buffer a client should have under ideal network conditions. As such, minBufferTime is not describing the burstiness or jitter in the network, it is describing the burstiness or jitter in the content encoding. Like the @bandwidth value, it is a property of the content. Using the "leaky bucket" model, it is the size of the bucket that makes @bandwidth true, given the way the content is encoded.
Is this interpretation correct? If so, part of the problem may be that the definition of @bandwidth is the DASH spec is not really clear in this area. I think many people (myself included) have assumed that @bandwidth represented an instantaneous peak value. Interpreting @bandwidth in this way is simple, but also inefficient for highly VBR encoded content. The leaky-bucket description is potentially more efficient, and it makes sense it would be used, but the spec never comes straight out and describes that.
Input from Kilroy:
[KS] it is describing the burstiness or jitter in the content encoding. Like the @bandwidth value, it is a property of the content. Using the "leaky bucket" model, it is the size of the bucket that makes @bandwidth true, given the way the content is encoded. I think that is how it is defined now, using the model of a perfect network with no latency or jitter but at transfer rate exactly equal @bandwidth. Maybe we should just leave the network out of it and say VBR encoded content read in realtime will never underflow or overflow a buffer of size @minBuffer. That makes it the buffer size used to measure the max bitrate of the content sitting in a production studio with no Internet in sight. Bandwidth is a network concept and is confusing everyone. I would call it the @VBRbuffer used to measure the maximum bitrate of the Representation. An MPD hasn’t a clue what the network bandwidth will be, so that’s a red herring. I think that definition is equivalent to saying constant bitrate delivery at @bandwidth and immediate realtime decoding (at constant time rate) will never underflow or overflow a buffer of size @minBuffer. Obviously if the decoder waits to start decoding after the first bits arrive, overflow could occur in the network “push” or rate controlled “pull” model. For CBR, the buffer could be the AVC VBV buffer size of the encoder if delivery was steady and you never switched bitrates. In the constant bitrate model, you’d only have to buffer the bitrate variations within a CVS, and something like audio might only have to buffer a sync frame, if that. It isn’t very useful information for adaptive streaming when reduced to that. That’s a nice measurable number and all, but not particularly useful. The maximum Segment duration * @bandwidth is going to be a more useful starting point for a practical buffer size. You need to be able to take delivery of the largest possible Segment while making a bitrate switch, assuming you can’t switch mid-segment using ‘sidx’ subsegment tricks where you cancel delivery of a partial Segment to request new one that will splice to the Subsegment in the buffer. The real challenge that will determine how far a player has to lag the live edge is adaptive switching, and the time delay from SegmentRequestTime to start of SegmentArrivalTime … which may be hard to predict.. A player making a switch has to be buffer enough to feed the decoder while waiting for the next Segment request to be encoded, processed, and at least partially delivered. Max Segment duration says how long the encoder might take before the next Segment becomes available, so you need to delay requests by that amount to avoid 404s. But, delivery latency could be several seconds if there are packaging, CDN, etc. delays, so the buffer has to store at least that much time. A CDN cached Segment might arrive in 100ms (low bitrate Segment on a fast network, low ping time route), but the next Segment/bitrate could take several seconds if not packaged and cached already. The player has to buffer that jitter to keep the decoder happy. The LatencyFromLiveVideo = decoder buffer delay + max sample duration + packaging time + encode on demand lag (if any) + origin server processing + CDN delay + network latency + plus max network delivery rate for max sample duration + request lag + VBR peak buffer + MPD sampling error (SegmentTimeline) or clock error. I think the VBR peak buffer (@minBuffer as defined) drops out of the equation as soon as the player buffer size gets big enough to hold the largest possible Segment and various switch time to arrival time delays and averages out the spikes over the nominal @bandwidth that would drain the player buffer faster on decode. Latency and jitter are at least as important for determining buffer requirements as network bandwidth. When requests need to be scheduled depends on latency. If you are running ten seconds behind live edge, you can pipeline requests to compensate for latency (10 – maxSegmentDurtation, might be four 2 second Segments), or build the buffer on startup; but it is problematic to have Segments in flight that might be the wrong bitrate by the time they arrive. Practical buffers should have some “error margin” to allow time lag or damping/hysteresis in the adaptive algorithm to prevent constant bitrate switching. That response time safety buffer depends on the adaptive algorithm. Wherever possible, an adaptive algorithm should differentiate between the latency of Segment delivery, and the rate of Segment delivery (i.e. bandwidth). If a player only looks at round trip time, It won’t know the different between a 100Mbps last mile with 10 seconds latency and 1Mpbs last mile that takes the same RTT to deliver the same Segment. In the first case, a player measuring Segment delivery rate could calculate the latency and buffer more, then request e.g. 20Mbps Segments and schedule requests expecting latency.
Scheduling media playout at the start-up when accessing the live service: https://github.com/Dash-Industry-Forum/Live/issues/4
The text in section 3.4.4 needs to be updated and more explanation needs to be provided on how the play out happens in the client.
A DASH client should start playout from: • The time indicated by the MPD Anchor, if one is present • The live edge, if there is no MPD Anchor and MPD@type="dynamic" In a straight-forward implementation, a client would download the latest available seg-ment and would render the earliest presentation time EPT(k) of the segment at PSwc[i] + (EPT(k) - o[r,i]) + PD. Typically, this results at a start-up delay between the download of the segment and the playout of between PD and PD - SDURATION. As PD may be quite large, for example in order to provision for downloading in varying bitrate condi-tions, it may be more suitable to download a segment that is not the latest one, but one that can be downloaded in time such that playout of the first sample can happen at time PSwc[i] + (EPT(k) - o[r,i]) + PD. If the download is only completed later than the above time, the actual rendering may start not with the sample of the earliest presentation time, but the one that matches as closely as possible PSwc[i] + (PT - o[r,i]) + PD equal to NOW. The client may also decide to loosen the tight synchronization to the suggested presentation delay.
Note that the value of the minimum buffer time does not provide any instructions to the client on how long to buffer the media. The minimum buffer time provides information that for each Stream Access Point (and in the case of DASH-IF therefore each start of the Media Segment), the property of the stream: If the Representation (starting at any segment) is delivered over a constant bitrate channel with bitrate equal to value of the @bandwidth attribute then each presentation time PT is available at the client latest at time with a delay of at most PT + MBT.