AOMediaCodec / av1-avif

AV1 Image File Format Specification - ISO-BMFF/HEIF derivative
https://aomediacodec.github.io/av1-avif/
BSD 2-Clause "Simplified" License
463 stars 40 forks source link

We should have a wiki page containing recommended file structures for common use-cases #165

Open leo-barnes opened 3 years ago

leo-barnes commented 3 years ago

It's not always clear from the spec exactly (or even in general) how AVIF files should be structured. One solution that has been proposed is to have a wiki page with recommendations and examples. Some things it should probably contain:

Common use-cases that could have an example file/file-structure description:

All of these should probably have an irot and colr to make it clear how that fits into the structure.

baumanj commented 3 years ago

At the top level, the file should be laid out in the following order: ftyp, meta, moov (if an image sequence), mdat. This allows for efficient parsing and decoding in most use-cases.

My reading of BMFF (ISO/IEC 14496-12:2020) § 4.3.1 is that ftyp being first is required (emphasis mine):

This box shall be placed as early as possible in the file (e.g. after any obligatory signature, but before any significant variable-size boxes such as a MovieBox, MediaDataBox, or FreeSpaceBox).

I would also suggest moov, mdat for avis major brand, but mdat, moov for avif major brand. It would be a potential boon to efficiency to not have to read the image sequence frames if they're not going to be displayed.

Or perhaps always mdat, moov if the mdat contains a thumbnail or poster frame that may be rendered first, or in the case the UA has animations disabled.

It's probably also worth mentioning that it's mandatory to include a hdlr box within the meta; I've seen at least one AVIF writer which omitted it (but added it in a later version).

I'd also include something about the mandatory properties ispe (per HEIF (ISO/IEC 23008-12:2017) § 6.5.3.1), av1c (per AVIF § 2.2.1) and pixi (per MIAF (ISO/IEC 23000-22:2019) § 7.3.6.6) (though as you know, that's in the process of being relaxed potentially)

If the image has an alpha plane, the data for it should precede the data for the main image in the mdat.

This doesn't really matter, at least to my implementation, but I don't have any objection to this guideline if it helps others.

If the main use-case is for the image to be displayed in a browser, the order of data in the mdat should be thumbnail, main image, metadata

What sort of metadata are you referring to here? Exif? Presumably not the meta box.

If the color info can be described by a colr box with type nclx, that is preferable to using an ICC profile

Along the same lines it's probably worth noting that the colr box will be unnecessary for most images since (1/13/6) is the default interpretation, which matches JPEG and there's really no such thing as an "untagged" AVIF. One potential pitfall is that there's no "unspecified" default for the color range, and that's always specified in the AV1 bitstream, so writers need to be confident that value is correct or override it in an nclx-type color box.

Additionally, in the rare case of a ICC profile being necessary, authors need to be confident the matrix coefficients used to map YCbCr → RGB are (6) BT.601, or else specify them in the AV1 bitstream or with an additional nclx-type color box.

All of these should probably have an irot and colr to make it clear how that fits into the structure.

Potentially irot and imir since

  1. Order matters (per MIAF (ISO/IEC 23000-22:2019) § 7.3.6.7)
  2. imir interpretation is currently the opposite of the published spec (as of HEIF (ISO 23008-12:2017))
cconcolato commented 3 years ago

@leo-barnes you raise interesting points but with all these 'should', this looks like the spec needs to be updated rather than having a wiki. Or maybe we use the wiki as a way to iterate over the next changes to the spec.

Just adding some more points: