cta-wave / content-specification-task-force

13 stars 0 forks source link

2018 Content Spec: AMD 1: Clarify notes section 3.2 regarding emsg box #6

Closed johnsim closed 5 years ago

johnsim commented 6 years ago

(Section 3.2) Clarity – in the notes section, remove the sentence “For instance, when these boxes are contained at the start of a CMAF Chunk that isn’t first in a Fragment, or a Fragment that isn’t first in a Segment (containing multiple Fragments), a Player application isn’t expected to parse the entire object to find them.” I believe the topic of emsg box handling will require considerable more discussion in the DPCTF and recommend we address this topic after that work.

johnsim commented 6 years ago

(Thomas) concern that work should be done in MPEG to be consistent across groups/implementations. (Kilroy) CMAF says required placement/processing of segment type box and emsg box. "only expected to be parsed" at start of a segment (ISOBMFF segment - moof box followed by...) (Thomas) practical, not hypothetical (Will) not happy with all problems solved by MPEG. No resolution on call -

KilroyHughes commented 6 years ago

CMAF is consistent with ISOBMFF in stating that only 'styp' and 'emsg' boxes prepended to the start of a segment are expected to be read. This allows a streaming app that will use the 'emsg' to parse it from the start of a downloaded segment without parsing the entire segment. But, a "segment" in ISOBMFF can be a complete media file, a complete and independently decodable movie fragment, or a portion of a complete movie fragment containing a subset of samples in a 'moof'/'mdat' pair. CMAF also defines those packaging and delivery options as a CMAF Track File, CMAF Segment, or CMAF Chunk.

A streaming app and manifest can determine which CMAF Objects are addressed and downloaded, so a service can prepend 'emsg' to the CMAF Addressable Objects it downloads and parse only the 'emsg' boxes prepended to those media objects. The use case is primarily low latency live signaling. For VOD, collecting events in the manifest or playlist is more efficient.

If parsing all 'emsg' boxes in any location were required, then the box parser in the media pipeline that parses Segments for playback would have to parse and pass 'emsg' messages to UX apps that handle these messages via some standard browser interface, which isn't specified and widely implemented today. That is beyond CMAF or WAVE control, and slow to change.

In the case of an embedded player, the only way events inside segments or in manifests/playlists would reach a UX application is through a "type 2" API between the user agent that parses the manifest and segments, and the service provider's UX app that consumes them. A "type 3" player (e.g. DASH.js) could parse all boxes in each downloaded segment using script to look for buried 'emsg' boxes, but the performance and complexity are impractical, and complete parsing would need to be repeated by the UA for actual media playback.

A service can avoid these problems by appending 'emsg' boxes to the Addressable Media Objects they download, and have their app only look for an 'emsg' box preceding the first 'moof' box. That is what CMAF and WAVE currently recommend. It is under the control of a service provider to prepend and parse 'emsg' boxes with their UX app on the Addressable Media Objects they stream.

Until a type 2 interface for events is supported by most deployed browsers, a service can't rely on an 'emsg' box being parsed in the UA and presented to the UX app when prepended with their UX app.to any 'moof' regardless whether that is the start of the downloaded Addressable Media Object that the app will inspect for 'emsg'.

One interpretation of the comment is to make end to end usage clearer in "MPEG" (i.e. CMAF, ISOBMFF, and DASH specifications, application guidelines, etc.).

Another interpretation of the comment is to change the design, i.e. to require full parsing of every segment by either a service provider's app or the embedded player to find and handle 'emsg' boxes in any location. Unfortunately, changing the spec wouldn't change anything in the real world, so it wouldn't be functional or interoperable in the foreseeable future.

johnsim commented 5 years ago

Agreement reached in Oct 10, 2018 conference call that 1) the WAVE messaging white paper should be published with the content spec amendment 1 in November, 2) if it cannot reference to it should be removed from the content spec, 3) that if it is then the note section can be shortened/removed.

johnsim commented 5 years ago

Agreed in October 24, 2018 conference call - remove reference to the WAVE white paper, and leave note section as it is. There will eventually be language in the device specification which could be referenced in the content spec, perhaps for our SPRING content spec release.