We should have a wiki page containing recommended file structures for common use-cases

leo-barnes commented 3 years ago

It's not always clear from the spec exactly (or even in general) how AVIF files should be structured. One solution that has been proposed is to have a wiki page with recommendations and examples. Some things it should probably contain:

At the top level, the file should be laid out in the following order: ftyp, meta, moov (if an image sequence), mdat. This allows for efficient parsing and decoding in most use-cases.
In the mdat, the data for thumbnail(s) should precede the data for the main image(s) so that it's faster for a decoder to display something quickly on screen.
If the image has an alpha plane, the data for it should precede the data for the main image in the mdat. If both the alpha plane and the main image is a grid, the alpha and main image tiles should be interleaved.
If the main use-case is for the image to be displayed in a browser, the order of data in the mdat should be thumbnail, main image, metadata. If the main use-case is as a camera capture format, the order of the data in the mdat should be thumbnail, metadata, main image.
- In a browser, the most common use-case is to as quickly as possible display pixels on the screen.
- In an OS, many operations query an image for metadata to decide how to handle it before it is decoded, which makes metadata more important to have before the main image data.
If using a grid, the grid item data should be stored in the idat so that the full layout of the file is contained within the meta.
If using a grid, the tiles it consists of should be marked as hidden.
If the color info can be described by a colr box with type nclx, that is preferable to using an ICC profile.
If the thumbnail is located within the first 128K bytes (I think that's the value), this can be indicated with the progressive brand from MIAF.

Common use-cases that could have an example file/file-structure description:

Main image, thumbnail, Exif
Main image, alpha, thumbnail, Exif
Grid, thumbnail, Exif
Grid, alpha, thumbnail, Exif
Multi-layer main image, thumbnail, Exif
Multi-layer main image, alpha, thumbnail, Exif
Multi-layer grid, thumbnail, Exif
Multi-layer grid, alpha, thumbnail, Exif

All of these should probably have an irot and colr to make it clear how that fits into the structure.

baumanj commented 3 years ago

At the top level, the file should be laid out in the following order: ftyp, meta, moov (if an image sequence), mdat. This allows for efficient parsing and decoding in most use-cases.

My reading of BMFF (ISO/IEC 14496-12:2020) § 4.3.1 is that ftyp being first is required (emphasis mine):

This box shall be placed as early as possible in the file (e.g. after any obligatory signature, but before any significant variable-size boxes such as a MovieBox, MediaDataBox, or FreeSpaceBox).

I would also suggest moov, mdat for avis major brand, but mdat, moov for avif major brand. It would be a potential boon to efficiency to not have to read the image sequence frames if they're not going to be displayed.

Or perhaps always mdat, moov if the mdat contains a thumbnail or poster frame that may be rendered first, or in the case the UA has animations disabled.

It's probably also worth mentioning that it's mandatory to include a hdlr box within the meta; I've seen at least one AVIF writer which omitted it (but added it in a later version).

I'd also include something about the mandatory properties ispe (per HEIF (ISO/IEC 23008-12:2017) § 6.5.3.1), av1c (per AVIF § 2.2.1) and pixi (per MIAF (ISO/IEC 23000-22:2019) § 7.3.6.6) (though as you know, that's in the process of being relaxed potentially)

If the image has an alpha plane, the data for it should precede the data for the main image in the mdat.

This doesn't really matter, at least to my implementation, but I don't have any objection to this guideline if it helps others.

If the main use-case is for the image to be displayed in a browser, the order of data in the mdat should be thumbnail, main image, metadata

What sort of metadata are you referring to here? Exif? Presumably not the meta box.

If the color info can be described by a colr box with type nclx, that is preferable to using an ICC profile

Along the same lines it's probably worth noting that the colr box will be unnecessary for most images since (1/13/6) is the default interpretation, which matches JPEG and there's really no such thing as an "untagged" AVIF. One potential pitfall is that there's no "unspecified" default for the color range, and that's always specified in the AV1 bitstream, so writers need to be confident that value is correct or override it in an nclx-type color box.

Additionally, in the rare case of a ICC profile being necessary, authors need to be confident the matrix coefficients used to map YCbCr → RGB are (6) BT.601, or else specify them in the AV1 bitstream or with an additional nclx-type color box.

All of these should probably have an irot and colr to make it clear how that fits into the structure.

Potentially irot and imir since

Order matters (per MIAF (ISO/IEC 23000-22:2019) § 7.3.6.7)
imir interpretation is currently the opposite of the published spec (as of HEIF (ISO 23008-12:2017))

cconcolato commented 3 years ago

@leo-barnes you raise interesting points but with all these 'should', this looks like the spec needs to be updated rather than having a wiki. Or maybe we use the wiki as a way to iterate over the next changes to the spec.

Just adding some more points:

Regarding the order inside the mdat (thumbnails, alpha, main, ...), MIAF has some constraints when some brands are used, e.g. progressive brand.
About mdat boxes, nothing forbids you to have multiple mdat boxes, e.g. meta, mdat, moov, mdat. This could be clarified.
What about moof boxes (i.e. fragmented mp4) which are common these days for videos. Should they be used for image sequences? Should they be avoided?

AOMediaCodec / av1-avif

We should have a wiki page containing recommended file structures for common use-cases #165