SRGSSR / pillarbox-documentation

Technical cross-platform documentation for Pillarbox
https://www.pillarbox.ch/
MIT License
3 stars 0 forks source link

API sync QoS #88

Closed defagos closed 3 months ago

defagos commented 4 months ago
amtins commented 3 months ago

We discussed that asset and metadata timings we send represent the QoE timings (can be 0 for perfect preloading). But what about drm and token timings? Are QoE values meaningful? If not (and if we are really interested in such values) shouldn't we group QoS values and QoE values separately in our JSON?

This is something that @bgaudaen wanted to have. From my point of view if it's easy to send why not, if it requires a ridiculous amount of work to send I would be more than incline not to.

total for QoE global timing is probably misleading (suggests a sum, which might not be always the case). Can we find a better name?

In which case this is not the case?

Shouldn't we rename buffer_duration as buffer, available_buffer or buffer_health

What this field represents ? The buffer's duration so buffer_duration is self explanatory. buffer_health doesn't meaning anything, what kind of value is expected ? buffer_health: "is good", buffer_health: "is meh", buffer_health: "feels a bit sick", so a strong no for me. available_buffer in terms of semantic starting by a noun is better.

Should we group stall metrics into a stall JSON object?

According to the comment issuecomment-2165008531 it's already the case. Maybe I'm missing something? My current implementation follows this guideline. To be honest I would prefer sending the stall as an event, this would avoid keeping track of this information during the playback session and would find easier to process on a backend. Lets simulate a session with the id XYZ:

START:XYZ:....
HEARTBEAT:XYZ:....
HEARTBEAT:XYZ:....
HEARTBEAT:XYZ:....
STALL:XYZ:420 // 420 is the duration in MS
HEARTBEAT:XYZ:....
STALL:XYZ:3569
... // etc

If I want to select the sum of the stall duration from all the events on earth where the event name is stall and the session id is XYZ, it would be easy. In other words: SELECT SUM(duration) AS stall_duration FROM all_the_events_on_earth WHERE event = 'stall' AND session_id = 'XYZ'

purfect

That said I'm fine with the stall object.

defagos commented 3 months ago

This is something that @bgaudaen wanted to have. From my point of view if it's easy to send why not, if it requires a ridiculous amount of work to send I would be more than incline not to.

On Apple platforms finding how much time is required to load a resource once its URL is known is simply not possible in all cases. If preloading is involved we measure a time of 0, we cannot know how much time was actually required for the whole preloading process.

In which case this is not the case?

Depending on how things are measured some overhead (e.g. time required to run code locally) might be missing. But I agree with you most of the time we should be able to measure things in a way that this is the case.

What this field represents ?

As suggested by @jboix buffer_duration might be interpreted as the time during which the player is buffering, thus the idea of a more expressive name.

According to the comment https://github.com/SRGSSR/pillarbox-documentation/issues/70#issuecomment-2165008531 it's already the case.

You're right, not sure where I read about the old format.

To be honest I would prefer sending the stall as an event

Possible but I guess we would need two events. When the player stalls you don't know how much time it stalls. Maybe it will stall and then crash. So you probably need a stall and a resume from stall events, with the difference between consecutive event pairs telling you the duration of that particular stall.

defagos commented 3 months ago

But what about drm and token timings?

We discussed the following update to our metric event JSON:

{
    "session_id": "1",
    "event_name": "START",
    "timestamp": 1717665997932,
    "data": {
        "qoe_timings": {
            "metadata": 1200,
            "asset": 1500,
            "total": 2856
        },
        "qos_timings": {
            "metadata": 1200,
            "asset": 1500,
            "drm": 896,
            "token": 456
        },
        "screen": {
            "width": 1280,
            "height": 720
        },
        "os": {
            "name": "android / iOS / macOS / windows / linux ",
            "version": "11.23"
        },
        "player": {
            "name": "pillarbox / video.js / letterbox",
            "platform": "android / apple / web",
            "version": "2.0.1"
        },
        "browser": {
            "name": "safari / chrome / firefox",
            "version": "11.0"
        },
        "media": {
            "id": "urn:rts:video:1234",
            "metadata_url": "https://il.srgssr.ch/composition/?urn=urn:rts:video:1234",
            "asset_url": "https://akamai.com/quality_content.m3u8",
            "origin": "ch.srgssr.srf-meteo / www.rts.ch/info/article/1234"
        },
        "device": {
            "id": "MAC / IDFV",
            "model": "Samsung Galaxy S24 / iPhone15,7",
            "type": "TV / Car / Phone / Tablet / Desktop / Headset"
        }
    }
}

total for QoE global timing is probably misleading (suggests a sum, which might not be always the case). Can we find a better name?

sum would be more misleading but total is fine.

Should we group stall metrics into a stall JSON object?

Already done.

Shouldn't we rename buffer_duration as buffer, available_buffer or buffer_health (YouTube name)? (@jboix)

buffered_duration is more explicit.

Maybe we need to introduce 3 types of timings for resource loading

Too early, we are missing information:

How could we manage versioning if the spec is changed?

Do we require events to be received sequentially? (e.g. START before STOP)

No need for sequential events, the server will consolidate events no matter the order they are received in based on session identifier and timestamp.

Asset URL with / without token

The asset_url should not contain the token. Other URLs (e.g. segments) might if they are sent, most notably in errors.

Failure when sending data

At client's discretion but really not needed. Depends on the effort required. For the moment we decided: