ISOBMFF Introductory and guidance material

This is a mess. We have section 5, and the beginning of section 6, section 7, and Annexes A and C:

5 DESIGN CONSIDERATIONS 5.1 USAGE 5.1.1 Introduction 5.1.2 Interchange 5.1.3 Content Creation 5.1.4 Preparation for streaming 5.1.5 Local presentation 5.1.6 Streamed presentation 5.2 DESIGN PRINCIPLES 6 ISO BASE MEDIA FILE ORGANIZATION 6.1 PRESENTATION STRUCTURE 6.1.1 File Structure 6.1.2 Object Structure 6.1.3 Meta Data and Media Data

section 7: 7 STREAMING SUPPORT 7.1 HANDLING OF STREAMING PROTOCOLS 7.2 PROTOCOL ‘HINT’ TRACKS 7.3 HINT TRACK FORMAT

Annex A: ANNEX A (INFORMATIVE) OVERVIEW AND INTRODUCTION A.1 SECTION OVERVIEW A.2 CORE CONCEPTS A.3 PHYSICAL STRUCTURE OF THE MEDIA A.4 TEMPORAL STRUCTURE OF THE MEDIA A.5 INTERLEAVE A.6 COMPOSITION A.7 RANDOM ACCESS A.8 FRAGMENTED MOVIE FILES

And Annex C: ANNEX C (INFORMATIVE) GUIDELINES ON DERIVING FROM THIS SPECIFICATION C.1 INTRODUCTION C.2 GENERAL PRINCIPLES C.2.1 General C.2.2 Base layer operations C.3 BOXES C.4 BRAND IDENTIFIERS C.4.1 Introduction C.4.2 Usage of the Brand C.4.3 Introduction of a new brand C.4.4 Player Guideline C.4.5 Authoring Guideline C.4.6 Example C.5 STORAGE OF NEW MEDIA TYPES C.6 USE OF TEMPLATE FIELDS C.7 TRACKS C.7.1 Data Location C.7.2 Time C.7.3 Media Types C.7.4 Coding Types C.7.5 Sub-sample information C.7.6 Sample Dependency C.7.7 Sample Groups C.7.8 Track-level C.7.9 Protection C.8 CONSTRUCTION OF FRAGMENTED MOVIES C.9 META-DATA C.10 REGISTRATION C.11 GUIDELINES ON THE USE OF SAMPLE GROUPS, TIMED METADATA TRACKS, AND SAMPLE AUXILIARY INFORMATION

But we nowhere describe: a) How to adapt the file to contain something new; how to use samples, sample entries, sample groups, sample aux info, and so on; track types, track references, and the entire 'adaptation layer' the format offers; b) How to transport the objects the file format defines: multiplexed movies, segments (initialization, media), meta-boxes (can they be delivered separately?) c) The basic timeline model: media timeline, track timeline, presentation timeline d) Composition model: how visual tracks are layered and composed, the implicit mixing of audio tracks

suggestions:

2.7.1.2 Concept Introduction The file format has this concept that there are two delivery vehicles – whole files, and segments – and two ‘carriage’ vehicles – whole files, and fragments – but these concept are not introduced in a coherent way before they are encountered. Though a section on ‘transport adaptation’ might be helpful, we should introduce these top-level concepts in chapter 6. 2.7.1.3 Chapter Overview Similarly, we should probably have an overview of the document structure very early on – certainly no later than chapter 6, and preferably earlier. “Chapter N describes…” and so on. 2.7.1.4 Adapting the file format The narrative introductions are good, but we badly need sections on the ‘adaptation layers’ that the file format offers. Having viewed the draft of the Opus-in-MP4 specification, where they reproduce much of the file format (needlessly), I think it would be helpful to have sections that describe, and give templates for: a) how to define how a new coded stream is stored in the file format (new codec): sample entry code and contents, sample format, sync samples, sample dependency, pre-roll, sample groups, track references, timestamps, and so on b) how to adapt the file format to new transport environments: where is the boundary between ‘a transportable blob’ and the insides of that blob? File and segment brands, segment indexes, and so on. c) how to define the storage of a new untimed (‘metadata’) item in the file format. Chapter 11 introduces extensibility, but fails to cover this in a coherent way, and introduces the worst idea (defining new boxes) first. Annex C covers this more. These two sections should be merged and brought up to date.

Note; there is a new section 6.4 on defining terms which uses many of the ideas here; it may be that we need to define terms (in the terms and definitions sub-clause) or make some aspects of 6.4 normative. The standard does not have clear definitions of timelines. Some text is defined here and there (including in informative section on RTP). The current text contains the following timelines: • media timeline • movie timeline • decoding timeline (same as media timeline) • composition timeline (same as media timeline) • output timeline (same as media timeline) • presentation timeline (same as movie timeline) • track time-line (should be media timeline) It is very confusing for readers. We suggest: a) defining the term "timeline" as: "a monotonic linear representation of times with respect to an origin point" b) defining "timestamp" as: "coded value representing time elapsed since the origin of an associated timeline and expressed using an arbitrary unit of time. NOTE1: By definition, the origin of a timeline has a timestamp of 0 NOTE2: A timestamp can be expressed in seconds by dividing its value by the associated timescale value." c) simplifying the current text by using only the 2 terms "media timeline" and "presentation timeline", with proper definitions: "media timeline timeline, associated to a track, whose origin is the decoding time of the first sample in the track NOTE1: decoding and composition times are times on the media timeline NOTE2: timestamps associated with the media timeline are coded based on the timescale given by the MediaHeaderBox" "presentation timeline" timeline, associated to the entire presentation, whose origin is the beginning of the rendering (possibly with no media data is being rendered) NOTE: the edit list maps the media timeline of a track and the presentation time NOTE2: timestamps associated with the movie timeline are coded based on the timescale given in the MovieHeaderBox" d) using "timeline" consistently.
Also, the standard sometimes uses "timeline" most of the time but some "time-line" are used. Only one spelling should be used to facilitate search in the text. e) defining "decoding time" and "composition time" "decoding time latest time on the media timeline at which the coded sample should be provided to the decoder" "composition time (earliest) time on the media timeline at which the decoded sample will be output from the decoder, if the coded sample is fed to the decoder at the sample decoding time"

This clearly relates to #3

for abstraction layers, we have this proposed text:

This specification defines an abstraction layer on which derived specifications are ideally built, leaving to this specification how these abstractions are built. These abstractions fall into two main groups: timed and un-timed media. The support for timed media views such media as a succession of timed samples, associated with setup information in a "sample entry". Derived specifications should define • the four-character code of the "sample entry" for the coding system • what constitutes a sample for the coding system (e.g. "a sample is an encoded metadata-frame as defined in XXXX"), • and what decoder initialization is required (e.g. "the decoder configuration information is a metadata-setup structure, contained in a FullBox of type 'medc', in the sample entry). • what constitutes a 'sync sample' for the coding system; example "a sync sample is a metadata-frame with the IndepdentProperty flag set as defined in XXXX"). The definition must conform to the general definition (i.e. a place where decoding and playback can start) • the type of Stream Access Points that are supported by the coding system Notice that these definitions do not need to discuss where the sample data is (in the same file, or another) or whether movie fragments are in use or not. Other tools for timed media include: • Sample groups. Here, there is some property of a given type that is shared by a set of samples in a track – a group definition. Multiple groups of the same type can be defined, and each sample in a track mapped to either a definition, or nothing. For example, if each sample were a chocolate, some of which contain varying amounts of nuts, we could define a 'nuts' sample group, with a parameter NutPercentage; a file might have such group definitions with percentages 5%, 25% and 45%. Those chocolates containing nuts are associated with the matching group definition. Editor's note: we may rephrase the Nut part with something more media related. • Sample auxiliary information. If the information for each sample is unique, sample groups do not work, as they rely on sharing a definition. Sample auxiliary information provides a unique piece of data for each sample. The usual example is initialization vectors for decoding. • Related tracks. Tracks can be linked by typed, directional, track references, or they can be grouped into TrackGroups. • Sub-sample information. If a given coding format naturally and usefully has a way to split a sample into sub-samples (e.g. base data, olfactory data, and gustatory data) then sub-sample structure can be documented to ease the finding of only the desired sub-samples. • Independent and disposable sample tagging. Similarly, for untimed data, the 'meta' box provides a number of tools. Again, the definition of such items should include what constitutes an item body, what initialization data is needed and what item property it is in. Items can be linked, by reference, to other items, just like track; and indeed, it is possible to unify the item_ID and track_ID number spaces so that references can be made between tracks and items, either way. The abstraction layer offered to systems that transport files or the media in them are basically movie files (entire presentations) and segments. A movie file may contain movie fragments, or may be complete without any fragments. Similarly a segment may be an initialization segment, containing the initial part of a movie up to and including a movie box that references no samples, or a media segment, containing one or more movie fragment boxes and associated samples. Editor's Note: We could add more guidance on metadata tracks vs. sample aux info, and on independent and disposable aspects.

addressed in the upcoming 7th edition

MPEGGroup / FileFormat

ISOBMFF Introductory and guidance material #7