WICG / datacue

A TextTrackCue based interface for arbitrary timed metadata, synchronized with audio or video media playback
https://wicg.github.io/datacue/
Other
26 stars 6 forks source link

Identification of timed metadata cue schemas #12

Open chrisn opened 5 years ago

chrisn commented 5 years ago

Following #11, a question arises as to how a web application should identify the schema of timed metadata cues. This is needed to allow the application to subscribe to receive events related to cues of a particular schema.

There are two parts of the HTML spec related to this:

TextTrack ids

For in-band tracks, the TextTrack's id is described as:

For in-band tracks, the track's identifier is specified by the media resource. If the media resource is in a format that supports media fragment syntax, the identifier returned for a particular track must be the same identifier that would enable the track if used as the name of a track in the track dimension of such a fragment.

The Media Fragments URI 1.0 (basic) spec does not describe how track URIs are constructed; this is deferred to the draft Protocol for Media Fragments 1.0 Resolution in HTTP (referred to as "Media Framgments URI 1.0 (advanced)"), where track media fragment URIs are only mentioned in informative text in the context of RTSP.

So, one option for identifying in-band timed metadata tracks could be to use the TextTrack.id field, and define a suitable media fragment URI format. A <track> element where the id field is not set by the page author seems strange, though.

inBandMetadataTrackDispatchType

TextTrack objects also have an inBandMetadataTrackDispatchType attribute:

This is a string extracted from the media resource specifically for in-band metadata tracks to enable such tracks to be dispatched to different scripts in the document.

Example: For example, a traditional TV station broadcast streamed on the Web and augmented with Web-specific interactive features could include text tracks with metadata for ad targeting, trivia game data during game shows, player states during sports games, recipe information during food programs, and so forth. As each program starts and ends, new tracks might be added or removed from the stream, and as each one is added, the user agent could bind them to dedicated script modules using the value of this attribute.

Other than for in-band metadata text tracks, the in-band metadata track dispatch type is the empty string. How this value is populated for different media formats is described in steps to expose a media-resource-specific text track.

Examples of how this value is set are given in HTML for different media formats (Ogg, WebM, MPEG-2, MPEG-4). This could be extended for in-band cues such as DASH emsg events.

The stated purpose of this field seems to achieve what we want, i.e., a provide way to dispatch metadata tracks to application code, and to be able to identify the cue schema. (I note that the "different scripts" terminology used here doesn't seem quite right, though).

Terminology

A note on terminology:

Much of the above description of TextTrack.id and inBandMetadataTrackDispatchType could apply to UA-generated cues, not only in-band cues, for those browsers that feature native DASH or HLS players.

Questions

Do these two mechanisms achieve the same goal? Or if not, how do they differ?

What browser support currently exists for both, for in-band timed metadata tracks?

Should we use TextTrack.id with a suitable media fragment URI to identify in-band timed metadata cues? For example, the URI could say "give me the DASH emsg events of a given scheme_id and value in this media stream", or "give me the ID3 cues from this audio stream".

Or is inBandMetadataTrackDispatchType preferred? If so, what should the format of this string be, for metadata cue formats not currently supported in HTML?

I suspect I'm going down a path already visited by a previous group. Pointers to relevant discussions there are welcome!

Proposal (TBD, input needed)

We want to allow web applications to signal to the UA the timed metadata cue schemes that they want to receive.

If we do this using the <track> element, we should add an inBandMetadataTrackDispatchType attribute to this element to allow selection of the appropriate cues in the media content.

If we do this by providing APIs for application script to use, we should either:

chrisn commented 4 years ago

We discussed on the DataCue API call yesterday (minutes) that we should prefer the API to use a single metadata TextTrack for all cue schemes, rather than have a TextTrack per cue scheme, and so the scheme (type) of each cue would be exposed through the DataCue.type attribute.

In addition, we discussed possibly deprecating the TextTrack.inBandMetadataTrackDispatchType attribute. The stated purpose of this attribute isn't something that's used in practice:

This is a string extracted from the media resource specifically for in-band metadata tracks to enable such tracks to be dispatched to different scripts in the document.

Note on existing implementations: In WebKit, inBandMetadataTrackDispatchType always returns the same value, "com.apple.streaming". In HbbTV (which uses the TextTrack per cue scheme model), inBandMetadataTrackDispatchType contains the MPEG-DASH scheme_id_uri and value values.