WICG / datacue

A TextTrackCue based interface for arbitrary timed metadata, synchronized with audio or video media playback
https://wicg.github.io/datacue/
Other
26 stars 6 forks source link

Proposal: Add an additional field to "value" for pre-parsing of 'emsg' #21

Open swenkeratmicrosoft opened 4 years ago

swenkeratmicrosoft commented 4 years ago

Currently, all fields in DataCue.value come directly from the 'emsg' box.

However, this pushes ALL work for parsing 'emsg'.message_data onto the application.

In some scenarios, it may be valuable for the mp4 parser to itself do parsing on the contents of the 'emsg' box for some URNs and pass simplified data to the application while still allowing the application to parse the raw data if desired.

For example, let's say that, for URN foo, 'emsg'.message_data includes a signature over that data. The existing design would force every webpage and application that cares about URN foo to do its own cryptography to validate that signature. If, at some point in the future, the URN foo becomes a broader standard, the underlying parser may wish to implement that verification directly during mp4 parsing, and then somehow signal to the application that the signature was/was not verified successfully.

Even ID3 could benefit. For example, if the MP4 parser was also capable of parsing the ID3 metadata directly, it could expose a parsed version of it to the application and not require the application to call function parseSCTE35Data at all!

As such, I propose that the "value" object include an additional field.

value field any parsedData

emsg value N/A

Description If the underlying content file parser did parsing or validation of the underlying 'emsg' box before sending the event to the application, the parsedData field contains information related to that parsing. Otherwise, the parsedData field is set to null. The parsedData semantics must be defined by the owners of the scheme identified by the scheme_id_uri.

@johnsim

chrisn commented 4 years ago

In some scenarios, it may be valuable for the mp4 parser to itself do parsing on the contents of the 'emsg' box for some URNs and pass simplified data to the application while still allowing the application to parse the raw data if desired.

I fully agree. Where there are well known and supported emsg payloads, we want the UA to parse these and present them to the application as structured data. We have a related issue for this.

The proposed API includes the any value field which is intended to contain the parsed data, per the existing WebKit implementation. Perhaps, as an alternative which would align more with WebKit, we could have:

interface DataCue : TextTrackCue {
    attribute ArrayBuffer? data; // If non-null, contains the unparsed emsg message_data

    // Proposed extensions.
    attribute any? value; // If non-null, contains the parsed data
    readonly attribute DOMString type;
};

(Just a small note, ID3 metadata is different to SCTE-35.)

swenkeratmicrosoft commented 4 years ago

The proposed API you linked just says the following, nothing about "parsed data"

// Proposed extensions.
attribute any value;

In addition, the "Mapping to MPEG-DASH in-band emsg events" section explicitly says the following regarding the contents of the "value" field:

any value | Object containing data, schemeIdUri, and value (see below)

And the following table explicitly maps those three fields to the raw data from 'emsg', not parsed data.

Nowhere in the proposed API, for in-band 'emsg' events in DASH, is the ability to include parsed 'emsg' data, at least as defined currently on the link you sent.

chrisn commented 4 years ago

You're right, that's an oversight. I suggest we change ArrayBuffer data in the table to hold the parsed data.

swenkeratmicrosoft commented 4 years ago

Ah, but if you do that, you run into a problem. Consider the following scenario.

A new 'emsg' is released. At first, no UA supports parsing it, so 'data' has to hold the raw 'emsg' data in order for the javascript to parse it.

Over time, that 'emsg' becomes widespread. UAs want to start parsing it to reduce javascript overhead.

How does a webpage know whether the data field is the raw data or parsed data?

Furthermore, what if it turns out that most websites only need a small subset of the data contained in the 'emsg"? (In other words, the format of the specific 'emsg' has a lot of things that most websites don't use, but a handful of websites do.) As a result, what if the desire from the UA developers is to only parse the subset of the 'emsg' that most websites need, and expose it as, say, JSON for ease of use? What if a field they don't support at first (rarely used) gains traction and gets more widespread use so they want to add it to the JSON?

Now you're in a situation where most websites want the parsed data, a few want the raw data, and the UA might only give them the raw data depending on its version. Some websites decide that they only care about the parsed data and will require their users to update to a recent UA (e.g. "please update your browser to view this content"). Some webpages only care about the parsed data but don't want to force user upgrades, so they want to parse the 'emsg' data themselves if the parsed data isn't available from the UA. Some webpages want to parse the full raw 'emsg' because they need all its bells and whistles.

As a result, I strongly recommend you have ArrayBuffer data remain the raw 'emsg' data and add another ArrayBuffer parsedData field.

chrisn commented 4 years ago

How does a webpage know whether the data field is the raw data or parsed data?

My idea for this was in my previous comment, where we could have TextTrackCue.data hold the raw data and TextTrackCue.value.data hold the parsed data. Then the UA would populate either one or the other, but not both.

The concern I have with this is that, unless there's cross-browser support for a known set of emsg types, applications would have to include code to parse all emsg types they were interested in anyway. And applications would still have to include parsing code for older browsers that don't support the latest emsg type.

Furthermore, what if it turns out that most websites only need a small subset of the data contained in the 'emsg"?

We haven't considered this case so far. The assumption is that the UA would expose all the content from the emsg.

Now you're in a situation where most websites want the parsed data, a few want the raw data and the UA might only give them the raw data depending on its version.

Why would the website want the raw data if the UA can present it with parsed data? If some do want raw data, is it simpler overall just to present all websites with raw data?

As a result, I strongly recommend you have ArrayBuffer data remain the raw 'emsg' data and add another ArrayBuffer parsedData field.

I'm not sure I follow. Why use ArrayBuffer for parsedData, when (depending on the emsg contents) an object or a string may be a more natural representation?

swenkeratmicrosoft commented 4 years ago

My idea for this was in my previous comment, where we could have TextTrackCue.data hold the raw data and TextTrackCue.value.data hold the parsed data. Then the UA would populate either one or the other, but not both.

Ah. One field for the raw data and a separate field for the parsed data is all that's needed. It was the current "Always empty" comment on the current APIs "attribute ArrayBuffer data; // Always empty" that threw me off.

So, that would change to "attribute ArrayBuffer data; // 'emsg'.message_data", and then TextTrackCue.value.data would change to type "any" and represent the parsed data, correct?

If so, that sounds good to me.

chrisn commented 4 years ago

Yes, that's exactly it. I'll update the explainer.

chrisn commented 4 years ago

I have updated the explainer, which hopefully clarifies handling of parsed vs unparsed data. Do we still need to look at how to support exposing subsets of the message data?

swenkeratmicrosoft commented 4 years ago

I don't think the subset scenario is a serious concern. If that scenario DOES show up for a specific type, it's still solvable: the "value" field being "any" means that it could (for example) contain a version number, the subset of parsed fields it, and the raw message data as well.

chrisn commented 4 years ago

We discussed on the call yesterday (minutes) that having both data and value fields seems redundant, and that applications could simply check if the value is an ArrayBuffer to determine if the data is unparsed or parsed. I plan to update the explainer accordingly, unless you prefer to keep the current proposed design?

swenkeratmicrosoft commented 4 years ago

Fine by me. :)

chrisn commented 4 years ago

I'll close, having updated the explainer. Please take a look, and feel free to re-open if there's anything I've missed.

nigelmegitt commented 4 years ago

applications could simply check if the value is an ArrayBuffer to determine if the data is unparsed or parsed

This seems to assume that it is okay to prohibit the parsed output from itself being an ArrayBuffer. Given that this is a general purpose interface where neither the unparsed format nor the parsed format are defined directly, this seems like an odd limitation.

chrisn commented 4 years ago

I don't think we want to prohibit use of ArrayBuffer, if that ends up being the most appropriate type to use for a particular message format.

For interoperability, the formats would need to be defined, otherwise how does an application know how to interpret the data?

nigelmegitt commented 4 years ago

I don't think we want to prohibit use of ArrayBuffer, if that ends up being the most appropriate type to use for a particular message format.

Agreed. If the requirement is to permit both the raw and the parsed data format to be an ArrayBuffer, then that requirement can not be met if we also want to support type checking the format and switching based on whether it is an ArrayBuffer or not, I think? I guess we could define a subclass of ArrayBuffer called UnparsedArrayBuffer and define that the unparsed data format must be of that type, as an alternative similar solution.

For interoperability, the formats would need to be defined, otherwise how does an application know how to interpret the data?

DataCue should work both for interoperable, published formats, and also for private non-interoperable formats where the content provider and the client code provider have agreed formats out of band. It's the mechanism for exposure of data cues themselves that we are defining in DataCue, not all the payload formats, if I've understood correctly. The (parsed) payload formats would need to be defined elsewhere. (correct me if I've misunderstood!)

chrisn commented 4 years ago

The (parsed) payload formats would need to be defined elsewhere. (correct me if I've misunderstood!)

This is TBD. It could go in the DataCue spec itself, for widely used formats, or elsewhere. This is mentioned here in the explainer. One thought we initially had was captured in the Media Timed Events document here. I would expect this definition to include both how to parse the data from the media container, and the structure in which the data is presented to web apps via DataCue.

chrisn commented 4 years ago

If the requirement is to permit both the raw and the parsed data format to be an ArrayBuffer, then that requirement can not be met if we also want to support type checking the format and switching based on whether it is an ArrayBuffer or not, I think?

I'm struggling to think of use cases where both would be an ArrayBuffer. It would mean the UA has either parsed the data and transformed one binary representation into another, or parsed the data and exposed some subset to the web app. That's not to say use cases don't exist or shouldn't be supported.

So, to support this scenario we would need to go back to having separate fields for parsed and unparsed data.

One benefit of using DataCue.data is to maintain compatibility with any existing use of HTML5 DataCue, e.g., HbbTV, where data always exposes unparsed data.

In the current proposal, DataCue.data would be deprecated. For DASH emsg events, DataCue.value is an object with data and emsgValue attributes, so we could change this to have data (for parsed data) and rawData (for unparsed data).

eric-carlson commented 4 years ago

I'm struggling to think of use cases where both would be an ArrayBuffer. It would mean the UA has either parsed the data and transformed one binary representation into another, or parsed the data and exposed some subset to the web app. That's not to say use cases don't exist or shouldn't be supported.

I can't either and don't think we should accommodate it without a concrete, compelling, use case.

nigelmegitt commented 4 years ago

I can't either and don't think we should accommodate it without a concrete, compelling, use case.

Since we can design this constraint out easily (2 workable suggestions so far), shouldn't we require a compelling use case to prohibit it?

eric-carlson commented 4 years ago

I can't either and don't think we should accommodate it without a concrete, compelling, use case.

Since we can design this constraint out easily (2 workable suggestions so far), shouldn't we require a compelling use case to prohibit it?

I was trying to say that I think it is important to only include features in the spec for which compelling use cases have been identified.

Of course it is possible to remove things later, but why spend the time to define and spec a feature if it is likely to be removed later?

nigelmegitt commented 4 years ago

why spend the time to define and spec a feature if it is likely to be removed later?

I'd agree with that, in general, but this feels like a different kind of situation. Allowing the parsed type to be the same as the unparsed type could be seen as a feature, but I rather see not allowing it as a consequence of poor design, where an obvious possibility hasn't been taken into account.

It's not so much defining and specifying a feature that might be removed later; rather I think we are defining and specifying the known wanted features in such a way as to reduce the likelihood that we might want to change it later.

chrisn commented 2 years ago

@swenkeratmicrosoft I'm just coming back to this, sorry for taking so long. I see that C2PA has not used DASH emsg in its specification, instead uuid or potentially a new c2pa box. Are the requirements for parsed and unparsed emsg data still needed for DataCue?

swenkeratmicrosoft commented 2 years ago

@chrisn

You are correct. C2PA decided not to use 'emsg' and thus any C2PA-specific requirements regarding 'emsg' no longer apply.

chrisn commented 2 years ago

Thanks @swenkeratmicrosoft. I'd like to invite C2PA to consider bringing any requirements for other browser APIs (or even non-API browser features) to the Media & Entertainment Interest Group, where we have a tracking issue.

On this specific issue, I now suggest that DataCue does not try to provide both parsed and unparsed data. I think that the API proposal will be simpler overall if we can specify a emsg to DataCue mapping that simply passes through the unparsed message_data for all message schemas, making it the application's responsibility to parse and interpret the data.