Open bbert opened 1 year ago
2023/10/06 TF meeting
- Before clarifying if it must be SHALL or SHOULD, I’d like to consider the use case for which the DCSM could be refreshed before TTL interval. The recommended value for TTL is 300 seconds, and in some cases, it would be valuable to force the players to refresh the manifest without waiting for TTL interval. As an example, a new CDN server could be allocated and which we would like to prioritize as soon as possible. Instead of globally reducing the TTL to a very low value and overload the steering server, one could decide to force the players to update the DCSM when necessary.
I am not really convinced we need such a method to break the TTL (besides the complexity of it, see below). With a 300-sec TTL, all players are evenly distributed within this 5-min window, so they will come to the steering server one by one. In this particular use-case, it guarantees a graceful redirection of players to the new CDN, although forcing an update could generate a storm on the new CDN.
This would require obviously an external control mean to steer the players and I don’t know how and if this should be part of the content steering specification.
I would expect the steering server to be a stateless service, which does not store any information about the players. Furthermore, the steering server is not expected to know which players are still watching the session.
As a control mechanism, one potential solution is for example to standardize a CMSD response header key to force the players to reload the DCSM.
A CMSD message is issued by the CDN edge server. It cannot be the trigger to reload the DCSM since the decision to force the reload would come from the steering server... unless the steering server could ask the CDNs to send a CMSD message on its behalf.
As an example, a new CDN server could be allocated and which we would like to prioritize as soon as possible. Instead of globally reducing the TTL to a very low value and overload the steering server, one could decide to force the players to update the DCSM when necessary.
I remember this video from Apple WWDC22. They explain Pathway Cloning
(starting from 8:16
with the background story) used to introduce a new CDN to the system. The idea still relies on the DCSM update at each TTL. They add PATHWAY-CLONES
field to DCSM.
I try to illustrate the edge cases in which we want the new CDN to join the system before TTL (maybe preferably without any delay). But, for such cases, the player has the second (and so on) pathway on the PATHWAY-PRIORITY
list as a backup.
Thanks @gwendalsimon and @burak-kara for your comments.
@burak-kara yes I know about pathway cloning but the use case was to update DCSM in order to get precisely new pathways before TTL delay.
@gwendalsimon I agree with you on the facts that steering server should preferably be stateless and the difficulties to ask CDN sending CMSD messages.
Let's tackle this issue in another way. In fact the use case would be to enable a player to know about new pathways when it encounters some issues with current available pathways.
A potential solution is to complete the client bahaviour specification by adding the possibility for the client to refresh the DCSM when it encounters playback problems and when it has already switched to all of the available pathways.
By the way, I think we should explain more precisely in the client steering behaviour what is meant by "If the client encounters playback problems". When should a client make a BaseURL or Location switch?:
duress
field from CDN)Or is it completely opened to player implementation? @dsilhavy do you have any opinion on that?
Encourage to review the latest specification here: https://members.dashif.org/wg/Interoperability/document/4810
We should check check what the IOP says. Do we reload the MPD in case of repeated segment 404? IOP and MPEG-DASH recommends to reload the MPD. That may resolve the issue for bertrand.
@dsilhavy please let know how you have implemented. Then we fix the spec. and the we check of bertrands still exists and then we fix the spec even more.
We should check check what the IOP says. Do we reload the MPD in case of repeated segment 404? IOP and MPEG-DASH recommends to reload the MPD. That may resolve the issue for bertrand.
And in case MPD uses the same baseUrl as for the segments, the player would not be able to refresh the MPD. Please consider the use case where the player streams the content (MPD+segments) from a CDN and needs to be redirected to a newly created CDN/pathway to avoid playback failure.
This is what dash.js does today:
retryAttempts
and retryInterval
can be configured via the player settings. Per default we try three times with a waiting time between 500-1000ms (depends on the type of the object that is requested)retryAttempts
reaches 0 we blacklist the BaseURL
and move to the next available BaseURL
.BaseURL
elements are blacklisted playback is terminated.As of today, we are not refreshing the manifest in case of repeated segment 404s. We are also not refreshing the DCSM.
What would be great if we can also collect the relevant parts of the specifications that dash.js shall implement to improve the current behavior.
Live TF 2024/03/01
Accepted that the spec details need to be collected.
IOP WG 2024/10/29
We suggest to update client behaviour
Please comment, we will update the spec.
As @dsilhavy was explaining what dash.js does, let me try to generalise the list.
The player receives a 404 response on a segment download and can perform the following actions:
BaseURL
and try again (IOP / DASH)Generally I would be okay to add Content Steering update to the list. Its a reasonable thing to do. My question would be if we want to formalise the client behaviour more than just allowing this option as well? If we just add this to the content steering spec, as an implementer, it might be unclear in which order the client is to go through the list above.
In the IOP Guidelines we basically quote the MPEG Spec and say in 4.8.2.1
Similarly, if the DASH access client receives an HTTP client error (i.e. messages with 4xx error code) for the request of a Media Segment, the requested Media Segment may not be available anymore or may not be available yet. In both these case the client should check if the precision of the time synchronization to a globally accurate time standard or to the time offered in the MPD is sufficiently accurate. If the clock is believed accurate, or the error re-occurs after any correction, the client should check for an update of the MPD. . If multiple BaseURL elements are available, the client may also check for alternative instances of the same content that are hosted on a different server.
This is in itself already ambiguous since it it not clear if the client should prioritise multiple BaseURL
entries over retry behaviour or manifest updates. That said, I would propose the following order:
The implementation may decide to do steps 4. (and 5.) in parallel to ongoing segment download retries and not synchronously.
We should do the Content Steering update before the Manifest Update. @bbert mentioned above already that in case Manifest+Segments are coming from the same CDN and there is an issue, the player will not be able to do the Manifest update.
I added 2. and 6. to the list because this is something that I think is reasonable behaviour and there are popular implementations out there (ExoPlayer is one of them) that implement this as well.
What I am not sure of is if we should first try alternative BaseURLs or first (synchronously) update Content Steering. At the end I think it is a matter of available time for the client. If the client has enough buffer, it can easily first get an update from the content steering server. If the client is very close to running out of buffer, it might be better to use an alternative BaseURL. I would also assume here that the list of alternative BaseURLs is already sorted based on the last pathway priority response from the steering server. In this case, going to the next in the list is probably a reasonable and fast choice?
@bbert you also asked here if we should further clarify when the client should do a BaseURL or location switch. Personally I think this should only happen in the error case. Mostly because it would keep it simple and a lot of the other properties might easily depends on the client and the clients network rather than something upstream.
I have some considerations on content steering specification about TTL.
In the semantics (table 6.3.1) the spec says: “Specifies how many seconds the client shall wait before reloading the DCSM.” => SHALL
In section 7, bullet 7: “the client should parse it and retrieve the VERSION, TTL….” => SHOULD (it shall be SHALL)
In section 7, bullet 8: “The client sets a timer to re-request the STEERING-SERVER-URL after TTL seconds.” => SHALL or SHOULD is missing
In section 7, bullet 13: “It should still reload the RELOAD-URI after the specified TTL interval in case new service locations are added.” => SHOULD