Open swenkeratmicrosoft opened 4 years ago
In some scenarios, it may be valuable for the mp4 parser to itself do parsing on the contents of the 'emsg' box for some URNs and pass simplified data to the application while still allowing the application to parse the raw data if desired.
I fully agree. Where there are well known and supported emsg payloads, we want the UA to parse these and present them to the application as structured data. We have a related issue for this.
The proposed API includes the any value
field which is intended to contain the parsed data, per the existing WebKit implementation. Perhaps, as an alternative which would align more with WebKit, we could have:
interface DataCue : TextTrackCue {
attribute ArrayBuffer? data; // If non-null, contains the unparsed emsg message_data
// Proposed extensions.
attribute any? value; // If non-null, contains the parsed data
readonly attribute DOMString type;
};
(Just a small note, ID3 metadata is different to SCTE-35.)
The proposed API you linked just says the following, nothing about "parsed data"
// Proposed extensions.
attribute any value;
In addition, the "Mapping to MPEG-DASH in-band emsg events" section explicitly says the following regarding the contents of the "value" field:
any value | Object containing data, schemeIdUri, and value (see below)
And the following table explicitly maps those three fields to the raw data from 'emsg', not parsed data.
Nowhere in the proposed API, for in-band 'emsg' events in DASH, is the ability to include parsed 'emsg' data, at least as defined currently on the link you sent.
You're right, that's an oversight. I suggest we change ArrayBuffer data
in the table to hold the parsed data.
Ah, but if you do that, you run into a problem. Consider the following scenario.
A new 'emsg' is released. At first, no UA supports parsing it, so 'data' has to hold the raw 'emsg' data in order for the javascript to parse it.
Over time, that 'emsg' becomes widespread. UAs want to start parsing it to reduce javascript overhead.
How does a webpage know whether the data field is the raw data or parsed data?
Furthermore, what if it turns out that most websites only need a small subset of the data contained in the 'emsg"? (In other words, the format of the specific 'emsg' has a lot of things that most websites don't use, but a handful of websites do.) As a result, what if the desire from the UA developers is to only parse the subset of the 'emsg' that most websites need, and expose it as, say, JSON for ease of use? What if a field they don't support at first (rarely used) gains traction and gets more widespread use so they want to add it to the JSON?
Now you're in a situation where most websites want the parsed data, a few want the raw data, and the UA might only give them the raw data depending on its version. Some websites decide that they only care about the parsed data and will require their users to update to a recent UA (e.g. "please update your browser to view this content"). Some webpages only care about the parsed data but don't want to force user upgrades, so they want to parse the 'emsg' data themselves if the parsed data isn't available from the UA. Some webpages want to parse the full raw 'emsg' because they need all its bells and whistles.
As a result, I strongly recommend you have ArrayBuffer data remain the raw 'emsg' data and add another ArrayBuffer parsedData field.
How does a webpage know whether the data field is the raw data or parsed data?
My idea for this was in my previous comment, where we could have TextTrackCue.data
hold the raw data and TextTrackCue.value.data
hold the parsed data. Then the UA would populate either one or the other, but not both.
The concern I have with this is that, unless there's cross-browser support for a known set of emsg types, applications would have to include code to parse all emsg types they were interested in anyway. And applications would still have to include parsing code for older browsers that don't support the latest emsg type.
Furthermore, what if it turns out that most websites only need a small subset of the data contained in the 'emsg"?
We haven't considered this case so far. The assumption is that the UA would expose all the content from the emsg.
Now you're in a situation where most websites want the parsed data, a few want the raw data and the UA might only give them the raw data depending on its version.
Why would the website want the raw data if the UA can present it with parsed data? If some do want raw data, is it simpler overall just to present all websites with raw data?
As a result, I strongly recommend you have ArrayBuffer data remain the raw 'emsg' data and add another ArrayBuffer parsedData field.
I'm not sure I follow. Why use ArrayBuffer
for parsedData, when (depending on the emsg contents) an object or a string may be a more natural representation?
My idea for this was in my previous comment, where we could have TextTrackCue.data hold the raw data and TextTrackCue.value.data hold the parsed data. Then the UA would populate either one or the other, but not both.
Ah. One field for the raw data and a separate field for the parsed data is all that's needed. It was the current "Always empty" comment on the current APIs "attribute ArrayBuffer data; // Always empty" that threw me off.
So, that would change to "attribute ArrayBuffer data; // 'emsg'.message_data", and then TextTrackCue.value.data would change to type "any" and represent the parsed data, correct?
If so, that sounds good to me.
Yes, that's exactly it. I'll update the explainer.
I have updated the explainer, which hopefully clarifies handling of parsed vs unparsed data. Do we still need to look at how to support exposing subsets of the message data?
I don't think the subset scenario is a serious concern. If that scenario DOES show up for a specific type, it's still solvable: the "value" field being "any" means that it could (for example) contain a version number, the subset of parsed fields it, and the raw message data as well.
We discussed on the call yesterday (minutes) that having both data
and value
fields seems redundant, and that applications could simply check if the value
is an ArrayBuffer
to determine if the data is unparsed or parsed. I plan to update the explainer accordingly, unless you prefer to keep the current proposed design?
Fine by me. :)
I'll close, having updated the explainer. Please take a look, and feel free to re-open if there's anything I've missed.
applications could simply check if the
value
is anArrayBuffer
to determine if the data is unparsed or parsed
This seems to assume that it is okay to prohibit the parsed output from itself being an ArrayBuffer
. Given that this is a general purpose interface where neither the unparsed format nor the parsed format are defined directly, this seems like an odd limitation.
I don't think we want to prohibit use of ArrayBuffer
, if that ends up being the most appropriate type to use for a particular message format.
For interoperability, the formats would need to be defined, otherwise how does an application know how to interpret the data?
I don't think we want to prohibit use of
ArrayBuffer
, if that ends up being the most appropriate type to use for a particular message format.
Agreed. If the requirement is to permit both the raw and the parsed data format to be an ArrayBuffer
, then that requirement can not be met if we also want to support type checking the format and switching based on whether it is an ArrayBuffer
or not, I think? I guess we could define a subclass of ArrayBuffer
called UnparsedArrayBuffer
and define that the unparsed data format must be of that type, as an alternative similar solution.
For interoperability, the formats would need to be defined, otherwise how does an application know how to interpret the data?
DataCue should work both for interoperable, published formats, and also for private non-interoperable formats where the content provider and the client code provider have agreed formats out of band. It's the mechanism for exposure of data cues themselves that we are defining in DataCue, not all the payload formats, if I've understood correctly. The (parsed) payload formats would need to be defined elsewhere. (correct me if I've misunderstood!)
The (parsed) payload formats would need to be defined elsewhere. (correct me if I've misunderstood!)
This is TBD. It could go in the DataCue spec itself, for widely used formats, or elsewhere. This is mentioned here in the explainer. One thought we initially had was captured in the Media Timed Events document here. I would expect this definition to include both how to parse the data from the media container, and the structure in which the data is presented to web apps via DataCue.
If the requirement is to permit both the raw and the parsed data format to be an ArrayBuffer, then that requirement can not be met if we also want to support type checking the format and switching based on whether it is an ArrayBuffer or not, I think?
I'm struggling to think of use cases where both would be an ArrayBuffer. It would mean the UA has either parsed the data and transformed one binary representation into another, or parsed the data and exposed some subset to the web app. That's not to say use cases don't exist or shouldn't be supported.
So, to support this scenario we would need to go back to having separate fields for parsed and unparsed data.
One benefit of using DataCue.data
is to maintain compatibility with any existing use of HTML5 DataCue, e.g., HbbTV, where data
always exposes unparsed data.
In the current proposal, DataCue.data
would be deprecated. For DASH emsg events, DataCue.value
is an object with data
and emsgValue
attributes, so we could change this to have data
(for parsed data) and rawData
(for unparsed data).
I'm struggling to think of use cases where both would be an ArrayBuffer. It would mean the UA has either parsed the data and transformed one binary representation into another, or parsed the data and exposed some subset to the web app. That's not to say use cases don't exist or shouldn't be supported.
I can't either and don't think we should accommodate it without a concrete, compelling, use case.
I can't either and don't think we should accommodate it without a concrete, compelling, use case.
Since we can design this constraint out easily (2 workable suggestions so far), shouldn't we require a compelling use case to prohibit it?
I can't either and don't think we should accommodate it without a concrete, compelling, use case.
Since we can design this constraint out easily (2 workable suggestions so far), shouldn't we require a compelling use case to prohibit it?
I was trying to say that I think it is important to only include features in the spec for which compelling use cases have been identified.
Of course it is possible to remove things later, but why spend the time to define and spec a feature if it is likely to be removed later?
why spend the time to define and spec a feature if it is likely to be removed later?
I'd agree with that, in general, but this feels like a different kind of situation. Allowing the parsed type to be the same as the unparsed type could be seen as a feature, but I rather see not allowing it as a consequence of poor design, where an obvious possibility hasn't been taken into account.
It's not so much defining and specifying a feature that might be removed later; rather I think we are defining and specifying the known wanted features in such a way as to reduce the likelihood that we might want to change it later.
@swenkeratmicrosoft I'm just coming back to this, sorry for taking so long. I see that C2PA has not used DASH emsg
in its specification, instead uuid
or potentially a new c2pa
box. Are the requirements for parsed and unparsed emsg
data still needed for DataCue?
@chrisn
You are correct. C2PA decided not to use 'emsg' and thus any C2PA-specific requirements regarding 'emsg' no longer apply.
Thanks @swenkeratmicrosoft. I'd like to invite C2PA to consider bringing any requirements for other browser APIs (or even non-API browser features) to the Media & Entertainment Interest Group, where we have a tracking issue.
On this specific issue, I now suggest that DataCue does not try to provide both parsed and unparsed data. I think that the API proposal will be simpler overall if we can specify a emsg to DataCue mapping that simply passes through the unparsed message_data
for all message schemas, making it the application's responsibility to parse and interpret the data.
Currently, all fields in DataCue.value come directly from the 'emsg' box.
However, this pushes ALL work for parsing 'emsg'.message_data onto the application.
In some scenarios, it may be valuable for the mp4 parser to itself do parsing on the contents of the 'emsg' box for some URNs and pass simplified data to the application while still allowing the application to parse the raw data if desired.
For example, let's say that, for URN foo, 'emsg'.message_data includes a signature over that data. The existing design would force every webpage and application that cares about URN foo to do its own cryptography to validate that signature. If, at some point in the future, the URN foo becomes a broader standard, the underlying parser may wish to implement that verification directly during mp4 parsing, and then somehow signal to the application that the signature was/was not verified successfully.
Even ID3 could benefit. For example, if the MP4 parser was also capable of parsing the ID3 metadata directly, it could expose a parsed version of it to the application and not require the application to call function parseSCTE35Data at all!
As such, I propose that the "value" object include an additional field.
value field any parsedData
emsg value N/A
Description If the underlying content file parser did parsing or validation of the underlying 'emsg' box before sending the event to the application, the parsedData field contains information related to that parsing. Otherwise, the parsedData field is set to null. The parsedData semantics must be defined by the owners of the scheme identified by the scheme_id_uri.
@johnsim