MPEGGroup / FileFormat

MPEG file format discussions
23 stars 0 forks source link

HEIF: Signaling for subsampling #81

Closed leo-barnes closed 3 months ago

leo-barnes commented 1 year ago

There currently is no item property box to signal subsampling. Parsers that want to understand the subsampling of an encoded payload are therefore required to parse the codec config or in some cases the payload itself. We have ispe and pixi and colr, but nothing for subsampling.

Things to take into account:

  1. In order to avoid the issues we've had with colr box and when/if it should override the coded payload, I think we the text for an item property like this should say that it shall match the subsampling specified in the coded payload.
  2. If the property is missing, no default should be assumed and a parser that wants to know will have to parse the coded payload.
  3. If the codec has a fixed subsampling and no internal signaling for it (i.e. it's hardcoded), it is permissible to have a subsampling item property at the container level to document this, despite the container level then potentially not being seen as matching the coded payload.
  4. We likely need to be able to specify subsampling per component in the same way as baseline JPEG does in order to be future-proof. In other words component 0 could have subsampling 2x1, component 1 could have subsampling 1x1 and component 2 subsampling 2x2 (where components 0 and 2 are subsampled compared to component 1). At least one component needs to have subsampling 1x1.
  5. If we want to allow for all baseline JPEG subsamplings, 2x2 is not enough. We also need 4x2.
  6. We should probably also allow for signaling "chroma" location for all subsampled components. This could be done as an offset to allow for >2 subsampling. Assuming we have a channel subsampling factor of S in a specific direction, this offset could take values [0,S], where 0 means co-sited with "left" "luma" value and S means co-sited with "right" "luma" value.
farindk commented 1 year ago

I'd theoretically advocate for such a property, but does it really make things easier in practice? If the property is optional (as proposed in 2.), all codecs have to include code to parse the configuration or coded payload of all supported compression formats anyways. Thus, decoding might be a tiny bit faster if the property is there, but the logic gets more complex and ambiguities might arise. Furthermore, we might limit what is possible with a future compression format. (For example, if a future codec finds a better subsampling pattern, we also have to change the specification of the subsampling property).

I think the information is already there in hvcC or av1C and it is good to have this specific for each codec. It is trivial to extract the information from there and convert it into some canonical form. If information is missing, e.g. the chroma pixel position in hvcC, this could be added by extending hvcC in a new box version.

cconcolato commented 1 year ago

Noting that ISO/IEC 23001-17 (Uncompressed Video and Images) defines a cloc box: image

leo-barnes commented 1 year ago

I'd theoretically advocate for such a property, but does it really make things easier in practice? If the property is optional (as proposed in 2.), all codecs have to include code to parse the configuration or coded payload of all supported compression formats anyways. Thus, decoding might be a tiny bit faster if the property is there, but the logic gets more complex and ambiguities might arise. Furthermore, we might limit what is possible with a future compression format. (For example, if a future codec finds a better subsampling pattern, we also have to change the specification of the subsampling property).

I think the information is already there in hvcC or av1C and it is good to have this specific for each codec. It is trivial to extract the information from there and convert it into some canonical form. If information is missing, e.g. the chroma pixel position in hvcC, this could be added by extending hvcC in a new box version.

These are all good points for existing codecs. For future codecs I would very much like to see a move away from specifying codec agnostic things in the codec config (or even coded payload) when possible though. It could then be mandated that the new box is required for those new codecs.

AVIF does this for example by not having CICP in the codec config since that can be specified with codec agnostic boxes at the container level. On the other hand, the AV1 codec config is basically fully redundant since the full information is stored inside the coded payload as well which is not ideal.

My main goal here is that for future codecs I would like to minimize the number of cases where a parser that does not want to decode a file needs to actually be able to parse the codec config/payload.

leo-barnes commented 3 months ago

This will be handled by the proposed update to the pixi property in the latest amendment draft. Closing.