Be able to report interval-based metrics to a 3rd party location via query-arg or header.

cta-wave / common-media-client-data

A repository to collect discussion and feedback on the Common Media Client Data proposal.

30 stars 0 forks source link

Be able to report interval-based metrics to a 3rd party location via query-arg or header. #117

Closed wilaw closed 1 month ago

wilaw commented 11 months ago

Today to ingest CMCD data , a distributor must consume CDN logs and extract the data. There is some utility in having the player report a subset of metrics at an interval to a 3rd party collector. This collector may be a QoE system, a content steering server, or some other collector of real-time information. The sending will be triggered by a combination of events and interval reporting.

The player would need to be supplied with two additional properties

The URL from which to request the beacon
The interval to use for interval-based reporting.

The following keys would then be sent, triggered by either an event or the interval:

Key	Description	Trigger
sid	session ID	Interval
bl	buffer length	Interval
bt	buffer target	Interval
vst	video start time	After start-up
vbr	video bitrate	Interval
vtb	video top bitrate	Interval
abr	audio bitrate	Interval
atb	audio top bitrate	Interval
bs	rebuffer	At completion of the rebuffer
mtp	measured throughput	Interval
sf	streaming format	Interval
st	stream type	Interval

Instead of sending videoBitrate,videoTopBitrate,audioBitrate,audioTopBirtate,bufferlengthbuffertarget, we could just send just the ratio for compactness: videoBitratePercentage, audioBitratePercentage, bufferPercentage

gwendalsimon commented 11 months ago

Today to ingest CMCD data , a distributor must consume CDN logs and extract the data. There is some utility in having the player report a subset of metrics at an interval to a 3rd party collector. This collector may be a QoE system, a content steering server, or some other collector of real-time information. The sending will be triggered by a combination of events and interval reporting.

Excellent idea.

In CMCDv1, it is possible for the client to transmit CMCD data in JSON files (I have not followed CMCDv1 effort so I have never really understood the motivation for it). Do you want to extend this JSON transmission mode with the destination (by default it would be the regular server) and the interval (by default it would be every request)? or do you want to create a separate CMCD transmission? If the latter, would this transmission to an external "CMCD Collector" mean duplicates CMCD transmission (i.e., the client transmits both its regular CMCD report to the CDN server and its beaconed CMCD report to the CMCD collector) or substituting CMCD transmission (i.e., the client transmits only the beaconed CMCD report).

The player would need to be supplied with two additional properties

The URL from which to request the beacon

The interval to use for interval-based reporting.

Ideally, these properties would be in the manifest files, wouldn't they? In the case of content steering, is there any "content steering v2" working group in which we could study the integration of such a "CMCD collector" into the spec? Note that the aforementioned duplicated or substituted modes could be part of this spec.

The following keys would then be sent, triggered by either an event or the interval:

Key Description Trigger sid session ID Interval

In CMCD, the client emits one report per request. In the case of an interval of say 10 requests, the client should transmit a report that is representative of the 10 reports. In the case of sid, the value is easy to determine since sid is constant.

bl buffer length Interval

In the case of bl, the value fluctuates during the whole interval time. Do you want to let the player decide whether it chooses the max, the min, the median, the mean, any percentile?

bt buffer target Interval vst video start time After start-up vbr video bitrate Interval

Here, the player emitted 10 discrete values. Should it send the 10 values? Or again the unique result of any function having the 10 values in input (by default the average)?

sebastian-siepe commented 11 months ago

@wilaw : thanks for the summary.

As discussed in our last call, there is also a substantial value in adding additional error information in case of a fatal error, which causes the playback to fail or the player to crash. This can either be a human readable error message or a distinctive error code.

This information is needed to detect playback failures such as broken media streams, codec incompatibilities, DRM problems, etc.

To fully understand this error information, there is also the need to add the name of the player in use. This is necessary to avoid any possible missinterpretation of the provided error information (e.g. if an error code is used in multiple player environments).

To capture these information, we propose to add two new properties: error information and player name

Key	Description	Trigger	Explanation
...	...	...	...
ei	error information	In the event of a fatal error	Either a human readable `error message` (example: `ei=MANIFEST_PARSING_ERROR`) or a distinctive `error code` (example: [`ei=4001`](https://developer.android.com/reference/kotlin/androidx/media3/common/PlaybackException#ERROR_CODE_DECODER_INIT_FAILED()))
pn	player name	Interval	Player Name (examples: `pn=ExoPlayer`, `pn=hls.js`, `pn=AAMP`

wilaw commented 9 months ago

Here is an diagram I put together to illustrate this feature.

Screenshot 2024-02-01 at 4 54 13 PM

nicolaslevy commented 9 months ago

Considering this functionality and a context of multiple CDNs (content steering), I believe we should consider adding a key to indicate the CDN used in the interval. (related issue: https://github.com/cta-wave/common-media-client-data/issues/114)

Also consider adding a way to allow custom data

wilaw commented 7 months ago

Meeting:

Glenn - this gets us in to the world of Conviva, don't want to build out that.
Will - we don't want the richness of what Conviva offers. This is a small, but still useful subset.
Gwendal - concern that we step on toes of the analytics systems. Current CMCD piggybacks on requests. Would like client to use red or blue, but not both. Beaconing would be to same server. IN favor of occasional beaconing.
Alex - in favor of being able to report to another CDN is useful. Reduces latency for getting metrics. Brighticove content steering at edge is a use-case. With analytics providers, have to integrate with each vendor. CMCD would be built in.
Sebastian - useful to have an end-point that is not the CDN. Paul - would fallback suffice?
Will - no, because CMCD has to be attached to media object requests.
Nicolas - agrees with this. Empowers content steering. I think Conviva is a product, (with support, analysis) this is just an interface.
Glenn - could be fallback for analytics vendors.
David - I propose we move forward with this with focus on content steering, would be valuable for that. Paul asked
Alex - there is a dash.js PR to add CMCD to steering servers already. Proprietary products should not preclude a standards interface. If we can do a minimal set that are useful for steering and monitoring, will be good for the industry.
Summary by Paul - there is value here to do something. Will proceed. Should have downward pressure on the complexity.

dhassoun commented 7 months ago

Wanted to note that a potentially very valuable use case we can focus on is providing key data to a content steering service without the service needing to access and process all the cdn logs. It could be a very valuable use case and provide some focus for this feature up front. This also provides more accurate and complete data than just basic throughput which they receive from currently such. I think enabling the option to configure which properties and the interval they are sent could be very useful to a steering service as well as other use cases.

PCaponetti commented 7 months ago

Paul: Talk about the potential for just error reporting? Will and Glenn: overlap with #113 and #87, Possibly re-opening this? Paul: Maybe we need to get clear on the mission before we pick an implementation? My thought is that we should hone in on the gap of lost connectivity, which is really important signal that is lost with CMCD v1 due to the data only being able to ride the requests that are already going through the CDN Gwendal: For the steering use case, we could piggyback the data on other requests to content steering? Could link with session ID? Will: requests might differ per player implementation if we don't standardize Alex: We probably do need something simple, and it would be ideal for it to be realtime (not requiring batch analysis). It could be content steering server, or we could give a different endpoint to the player. Glenn: Steering server could wear two hats, but we should allow it to not have to Will: Agreed, we shouldn't infringe on the content steering space with requirements for CMCD Nicolas: Should add some key to say which CDN being used, and add allowance for custom data Will: Yes, possibly add a CDN identifier key. Is the CDN a string that is user defined? Gwendal: How about the URL being sent? Will: That's quite verbose and doesn't allow for aggregation, what about host? Or just a field called CDN that can be populated however the player likes? Ian: Would we use the same name used in content steering? Will: Makes sense, but only if content steering is being used. Nicolas: How would we get the endpoint to the player? Paul: CMSD? Will: Let's keep CMCD and CMSD decoupled. Sebastian: Looking into keys for error states as well. Comment left on issue about this. Take inventory of types of errors and report enum-style. Will: The mapping of player errors to our enum is not an easy task Alex: Agreed, the mapping would be tough, but maybe a small set of errors but also a free-form field? Paul: if we don't aggregate to a small known set, we can't do high level analysis at the CDN or origin cross-player David, Jordan: Scared to say how hard it would be to get errors mapped into a small set, but agreed it will not be easy and never perfect/complete Will: What about player name field? That could be fingerprinting (conversation with many people seem to agree that this might not be a good idea) Ali: Possibly allow player to decide consistent or inconsistent hash so they can decide if they are alright with being fingerprinted or not Piers: Could follow the same approach user-agent took by moving to hints? Alex: Maybe complete encryption? Paul: Do we care about what the player is? Doesn't that move the burden of mapping error string to enum to the proxy where we are less able to do that mapping? Jordan: Agreed, the mapping is hard at the player but is harder at the proxy and risks privacy. David: +1 to what Paul said as long as we can get a very simple set of errors to make the mapping easy. (conversation around types of errors. detailed below)

wilaw commented 7 months ago

In regards to reporting an error. Possible generic errors - Are they fatal to playback? Yes, they should be fatal. Should also be easy to report and be readily available to the player.

Network error ( !200)
Video Decode error
Audio decode error
Invalid playlist/manifest
Stale playlist
Internal errors
Unsupported media version

Should we open a different issue? yes. Chairs to open.

nicolaslevy commented 6 months ago

I want to suggest splitting this issue into at least two parts. I believe one is related to defining beaconing, with the minimum set of keys already existing in CMCD v1 and those already defined in CMCD v2.

Regarding the addition of new keys like network error or the CDN identifier, I think it's best to have another issue so we can make incremental progress in the spec.

PCaponetti commented 6 months ago

I want to suggest splitting this issue into at least two parts. I believe one is related to defining beaconing, with the minimum set of keys already existing in CMCD v1 and those already defined in CMCD v2.

Regarding the addition of new keys like network error or the CDN identifier, I think it's best to have another issue so we can make incremental progress in the spec.

I fully agree.

We had noted that we would split this issue, but I'm holding off until we can hone in on what it is we would like to achieve.

My thoughts are we could go a couple ways with this:

If we want to enable system level (potentially cross-CDNs) QoE and health, we define a mechanism to send all QoE and health signal to an out of band location.
If we want to close the gaps in signal with solely in-band reporting (mostly lost connectivity), we define a mechanism to report out of band only when there is signal with keys that are focused around that signal
If we want to be flexible and put the onus on the user what to do with out of band reporting, we can define a mechanism to post batches of CMCD information (with custom keys potentially, and missing bits like original URL) to an out of band location.

I'm sure there are more ways to go with this, but I think we need this to agree on a use case and let that be our guide.

PCaponetti commented 6 months ago

Jordan: Also mentioned using content steering as the out of band place to do the reporting Paul: looking to the lost connectivity scenario as a valuable and easy to implement mechanism. Sebastian: Opt for out of band interval based QoE and health, only on error is not enough. Questions on player config and how hard that might be David: +1 to interval based, and +1 to not coupling CMCD and CMSD (adoption is low on CMSD). Also, we may be able to leave the spec largely intact and just report it elsewhere Paul: are we going for intervals or averages? Is holding state going to be an issue for the interval? Paul: what if we just add some optional averages to the spec, and extend the json format to allow batch reporting to wherever the player wants? Sebastian: +1 to sending everything (optionally), but could be a lot of data Paul: potentially adding something in the spec (max interval?) to make sure there isn't an undue burden on either player or proxy Jordan: from a privacy perspective, it would be better to send interval aggregates

Letting this marinate another 2 weeks.

PCaponetti commented 6 months ago

Will: aggregates are hard. we would have to come up with timeframes, player would have to hold state and do calculation Piers: maybe tracing versus all reports? Alex Giladi: Averaging loses P99 which is very important Will: the original intent was a snapshot in time, not verbose or sample. We don't need to rebuild conviva, a higher-level understanding of QoE is the target Alex: add live stream latency to set of keys Will: hearing consensus that we will leave any aggregations down to analysis, not this spec

Will: I'll take the action to put a strawman in the document

nicoweilelemental commented 5 months ago

In regards to reporting an error. Possible generic errors - Are they fatal to playback? Yes, they should be fatal. Should also be easy to report and be readily available to the player.

Network error ( !200)

Video Decode error

Audio decode error

Invalid playlist/manifest

Stale playlist

Internal errors

Unsupported media version

Should we open a different issue? yes. Chairs to open.

The SVTA QoE group recently started to work on a new "Standardized Player Error Codes" nomenclature project, maybe that would be a good topic to sync on.

PCaponetti commented 5 months ago

Talking about 3 proposed data transmission modes.

Nick: For mode 2 where we report to a different place than where the media is coming from, shouldn't that be a post? Will: Yep, changed to POST Glenn: Should modes be mixed into a hybrid? Will: It's important to have a mode that matches v1 Chris: Might want PUT for mode 2 because it is repeatable Gwendal: Could we tie mode 2 and 3 together? Will: the cadence is different and it's different enough that it should be independent Sebastian: Should we define how big the batch can be? Will: we shouldn't decide the batch size Sebastian: constrain mode 2/3 to put or post, query args would be too big David: I had wanted query args, but agree PUT or POST Paul: modes seem unnecessarily heavyweight. why not just a set of optional keys and suggested usages? Possibly getting more value out of mode 3 by using it as a heartbeat? (group): decided as a group that the cost of having modes is not necessarily too high, and the value is there. it could also be easy for players to only support specific modes of v2 as well. Support for mode 1 should be required for all cmcd v2 client implementations. Gwendal: could you use multiple modes inside of a session? Will/David: using multiple could be valuable, we shouldn't stipulate this constraint. Chris L: Depending on when we send the data for mode 2, it can have more information like TTLB and/or error if we wait to send it until after the request completes. (group): general agreement Alex G: for mode 3, what is the definition of state change? Will: went over the new state variable. when that changes, we report it. Chris: do we really need HTTP? Will: I don't think we can stipulate that we need to do HTTPS only, but we can encourage HTTPS Nicolas: from the receiver side how to differentiate mode 2 from 3? does it matter? Jordan S: question about what's allowable. like in mode 3 why isn't it possible to send request scoped things in mode 3? can we batch in mode 3? (group): talked more about whether or not we should differentiate between mode 2 and 3 again, but 2 doesn't get intervals and state changes and keeping those specific to mode 3 has value as mode 3 then becomes a specific mode for "lightweight monitoring".

Tightened up new key definitions for url and timestamp.

PCaponetti commented 5 months ago

Will: Do we do time or state change based? Glenn: you would want both Will: current language states both, with optional recurrence override, we can move the interval to optional Will: take a look at the states and events Paul: what about bitrate switch event? Will: can't put that into state, because set of states are mutually exclusive Paul: having the event could be a good flag, and specifically knowing if it is user initiated or payer initiated, could be valuable, but too niche, retracting Piers: notion of timestamps and state changes in mode 1 or 2? Will: resolution is tight enough with just the recurring segment requests, maybe not go into it right now. Glenn: Do we need a state to represent either client-side ad insertion or interstitial? Will: This can be a state of playing advertising Nicholas: Feeling that we don't need it, can be inferred Glenn: Don't want a state to be misleading, like ads playing but showing that we are in paused state, noting Apple doesn't implement content-id Rob: There's complexity here with different ad play mechanisms, maybe CMCD isn't the tool to understand how ad plays affect end users Will: We want the cmcd data to reflect the end user experience no matter if there are separate players Rob: Vote for states not get into what is playing Will: Possibly adding a bool for interstitial playback (for ads or bumpers)?

PCaponetti commented 4 months ago

With regard to ad plays, there is enough complexity and heterogeneity in players on how ad plays happen, and it shouldn't be assumed that we can get specific CMCD behavior across all players with regard to ad plays. It could even be ad plays are in separate players. That being said, there is a lot of value in understanding a session across ad plays.

Regarding the interstitial boolean, made a change such that instead of sending it while an ad is playing, send it on requests for segments for ad content. That way if we get requests to increase buffer on the main content while an add is playing, we can not muddy the waters and also send a player state for the primary content requests like p (paused) for prioritization.

Still some open questions here:

for "interstitial" content (ads) that are stitched into the main video, what do we have the player do? nothing special?
can we gain an understanding of issues with offsets (@Rob W: need your help on this one to detail out) where resumption isn't fully successful/seamless? Can this be handled with play state moving to play after an ad and immediately showing issues from other keys (buffer starvation?)?

wilaw commented 1 month ago

Paul: would like to close this issue out today and if necessary, spawn new issues. Paul: do we need to specify that the player not report SI mode? Will, no. Paul: default 30s interval. High for short form video. 15s would be better. Sebastian: half would be OK. Will: we should scale with duraiton of content. Or remove. Sebastian and Piers, default is good. Rob: should default be 0? Will: "Short form content may wish to use a shorter interval". Accepted Paul: why zero statement? Will: it defines what happens when you divide by zero. Paul: fatal erros codes from SVTA? Do we need space for error code? I think it would be valuable. Seb: error codes are useful. Chris: SVTA just defines error codes. Its buckets. Paul made new issue. #141. Paul: how ID CDN in SI mode. Could have mode3 specific key? Will: would hostname be sufficient? Chris: CNAMING would disrupt that. Giving them a beaconing place would be useful. Give them a string. Will: could call it CDN ID. Paul: Added a new key called 'CDN ID'. Value is String. A String defining the current delivery network from which the player is retrieving content. Chris: minimum must parse would be better than specifying length. Paul: we should have a separate issue on max length versus min parse. Will - we need a new issue on how to batch. Paul: added language to SU definition about batching. Also revised Response Mode definition. Piers: we should define what batching means. Paul: edited JSON reporting mode definition.

Group consensus to close #117 as fixed.

PCaponetti commented 1 month ago

💯