Open leo-barnes opened 3 years ago
At the top level, the file should be laid out in the following order:
ftyp
,meta
,moov
(if an image sequence),mdat
. This allows for efficient parsing and decoding in most use-cases.
My reading of BMFF (ISO/IEC 14496-12:2020) § 4.3.1 is that ftyp
being first is required (emphasis mine):
This box shall be placed as early as possible in the file (e.g. after any obligatory signature, but before any significant variable-size boxes such as a
MovieBox
,MediaDataBox
, orFreeSpaceBox
).
I would also suggest moov
, mdat
for avis
major brand, but mdat
, moov
for avif
major brand. It would be a potential boon to efficiency to not have to read the image sequence frames if they're not going to be displayed.
Or perhaps always mdat
, moov
if the mdat
contains a thumbnail or poster frame that may be rendered first, or in the case the UA has animations disabled.
It's probably also worth mentioning that it's mandatory to include a hdlr
box within the meta
; I've seen at least one AVIF writer which omitted it (but added it in a later version).
I'd also include something about the mandatory properties ispe
(per HEIF (ISO/IEC 23008-12:2017) § 6.5.3.1), av1c
(per AVIF § 2.2.1) and pixi
(per MIAF (ISO/IEC 23000-22:2019) § 7.3.6.6) (though as you know, that's in the process of being relaxed potentially)
If the image has an alpha plane, the data for it should precede the data for the main image in the
mdat
.
This doesn't really matter, at least to my implementation, but I don't have any objection to this guideline if it helps others.
If the main use-case is for the image to be displayed in a browser, the order of data in the
mdat
should be thumbnail, main image, metadata
What sort of metadata are you referring to here? Exif? Presumably not the meta
box.
If the color info can be described by a
colr
box with typenclx
, that is preferable to using an ICC profile
Along the same lines it's probably worth noting that the colr
box will be unnecessary for most images since (1/13/6) is the default interpretation, which matches JPEG and there's really no such thing as an "untagged" AVIF. One potential pitfall is that there's no "unspecified" default for the color range, and that's always specified in the AV1 bitstream, so writers need to be confident that value is correct or override it in an nclx
-type color
box.
Additionally, in the rare case of a ICC profile being necessary, authors need to be confident the matrix coefficients used to map YCbCr → RGB are (6) BT.601, or else specify them in the AV1 bitstream or with an additional nclx
-type color
box.
All of these should probably have an
irot
andcolr
to make it clear how that fits into the structure.
Potentially irot
and imir
since
imir
interpretation is currently the opposite of the published spec (as of HEIF (ISO 23008-12:2017))@leo-barnes you raise interesting points but with all these 'should', this looks like the spec needs to be updated rather than having a wiki. Or maybe we use the wiki as a way to iterate over the next changes to the spec.
Just adding some more points:
mdat
(thumbnails, alpha, main, ...), MIAF has some constraints when some brands are used, e.g. progressive brand.mdat
boxes, nothing forbids you to have multiple mdat
boxes, e.g. meta
, mdat
, moov
, mdat
. This could be clarified.moof
boxes (i.e. fragmented mp4) which are common these days for videos. Should they be used for image sequences? Should they be avoided?
It's not always clear from the spec exactly (or even in general) how AVIF files should be structured. One solution that has been proposed is to have a wiki page with recommendations and examples. Some things it should probably contain:
ftyp
,meta
,moov
(if an image sequence),mdat
. This allows for efficient parsing and decoding in most use-cases.mdat
, the data for thumbnail(s) should precede the data for the main image(s) so that it's faster for a decoder to display something quickly on screen.mdat
. If both the alpha plane and the main image is agrid
, the alpha and main image tiles should be interleaved.mdat
should be thumbnail, main image, metadata. If the main use-case is as a camera capture format, the order of the data in themdat
should be thumbnail, metadata, main image.grid
, thegrid
item data should be stored in theidat
so that the full layout of the file is contained within themeta
.grid
, the tiles it consists of should be marked as hidden.colr
box with typenclx
, that is preferable to using an ICC profile.Common use-cases that could have an example file/file-structure description:
All of these should probably have an
irot
andcolr
to make it clear how that fits into the structure.