Closed y-guyon closed 1 year ago
I feel like this doesn't require any changes in the library, but just a feature in avifenc
itself. Grid cells are supposed to be the same size, so I think the library is behaving properly, and I think they expect you to use a crop rect (clap
) to dial in the correct size.
This avifenc
feature would simply round up the image's size to the next grid cell multiple, and then clap
the resultant encoding back to the original dimensions. I don't believe this is one of the listed Options.
Yannis,
Thank you for the analysis of the spec. I agree there is a contradiction. I think the intention is "All input images shall have exactly the same width and height." Given this assumption, we can fix the contradiction by removing "(potentially excluding the right-most column)" and "(potentially excluding the bottom-most row)".
This assumption implies Option 2. We can try Option 2.2 first. For padding values it is common to pad with border pixels.
If I understand it correctly, Option 2.1 and Option 2.2 are not mutually exclusive, so we can still do Option 2.1 in the future.
I feel like this doesn't require any changes in the library, but just a feature in
avifenc
itself.
If users of the libavif API would like to generate incrementally decodable images of any size, it would be convenient for them to have a function doing the dirty work of splitting and padding. If we do this work in avifenc
, we might as well make it accessible in avif.h "for free".
Grid cells are supposed to be the same size, so I think the library is behaving properly
It matches one interpretation of the specification, yes.
When you store (100+8)x(100+8) AV1 samples, you can only end up with a (100+100)x(100+100) grid item output, where some tiles were deformed by scaling up. The grid item is then cropped to 108x108 (with output_width/height
or clap
) so some scaled AV1 samples are discarded. \
And on the other hand, if you want the correct output for a 108x108 input, it must be padded to 200x200 before encoding, and cropped at decoding to 108x108. \
I was mainly pointing out the oddness of being able to store exactly 108x108 AV1 samples in a valid grid AVIF without being able to correctly decode them into a 108x108 image.
I think they expect you to use a crop rect (
clap
) to dial in the correct size.This
avifenc
feature would simply round up the image's size to the next grid cell multiple, and thenclap
the resultant encoding back to the original dimensions. I don't believe this is one of the listed Options.
Using the clap
feature is unnecessary. As mentioned in section 6.6.2.3.1, the reconstructed image is formed by [...] trimming on the right and the bottom to the indicated output_width and output_height
. Since output_width
and output_height
are encoded with every grid
box, there is no need for an additional clap
box which would have the same effect. The clap
box has the same odd dimensions contraints on subsampled chroma samples as grid
if I remember correctly.
We can try Option 2.2 first. For padding values it is common to pad with border pixels.
The only remaining question is the privacy one. If I remember correctly, the clap
property is ignored in Chrome for this reason (meaning, always display all pixels). How would it be different for grid
? Is it related to the use of should
wording on the former and shall
on the latter?
If I understand it correctly, Option 2.1 and Option 2.2 are not mutually exclusive, so we can still do Option 2.1 in the future.
Sure, if you are talking about the second alternative of option 2.1 (add output_width
and output_height
to avifEncoder
).
The cropping done by a grid image is limited to trimming on the right and the bottom. This is why the privacy concern is less serious than clap
. But the cropping of a grid image was also brought up in the discussions of the privacy issues of clap
.
We can try Option 2.2 first. For padding values it is common to pad with border pixels.
Note: This is about AVIF grids/cells (used for incremental decoding), not about AV1 tiles.
Issue
libavif does not provide a way through
avifenc
or its API to encode an image as a grid of multiple cells if the image dimensions are not multiples of the cell dimensions.For example
avifenc --grid 4x2
on an image of 4002 by 1998 will fail.Specification analysis
HEIF (ISO 23008-12:2017) and HEIF (ISO 28002-12:2021) say the same:
My interpretation of the highlighted sentence:
tile_width
.tile_height
.output_width
andoutput_height
.This contradicts the following:
Example
An encoded AVIF bitstream with the following AV1 frames and the
grid
propertiesoutput_width
andoutput_height
set to 108:Can only be decoded by libavif if the
ispe
properties are all the same, so for example:libavif will then:
ispe
dimensions (hence 100x100), if not already matching that sizeoutput_width
xoutput_height
(here 108x108)The file itself is valid but the output is not what was intended (the right and bottom parts are stretched and cut). Removing the
ispe
contraint would make that possible.Option 1: allow different dimensions for right-most and bottom-most cells
The implementation of https://github.com/AOMediaCodec/libavif/pull/1140 matches this option.
Issues:
The libavif encoding and decoding sides are more permissive.
The specification will likely require a clarifying amendment that it is allowed.
The specification will likely require a clarifying amendment about whether right-most and bottom-most cells can be bigger than
tile_width
/tile_height
, or just smaller.AVIF files encoded with this modified
avifenc
do not pass the Compliance Warden (640x481 jpg image encoded withavifenc --yuv 444 --grid 2x2
):Note: The last two errors look suspicious because the image was correctly decoded with
avifdec
.However, even with libavif at head cd0bb358f83d01867f0fa53079470043618c9af5, encoding a 640x480 png with
avifenc --grid 2x1 --yuv 444
did not pass the Compliance Warden either ([miaf][Rule #5] Error: construction_method=-1 on a derived image item (ID=1)
).Option 2: only trim
If we still want to allow encoding images with dimensions that cannot be a convenient multiple of
tile_width
andtile_height
in libavif, there are two ways, both based on enforcing all cells to share the same dimensions, and then cropping tooutput_width
andoutput_height
.Issues:
Option 2.1: add/modify avifEncoderAddImageGrid() API
This will put the burden of generating cells of the same size to the user. Also there is currently no way to pass the desired
output_width
andoutput_height
to theavifEncoderAddImageGrid()
function or as a flag toavifenc
.Issues:
Alternatives for avoiding a breaking API change:
avifEncoderAddImageGrid()
with the above behavior. \ Note:avifEncoderAddImageGrid()
is already not that convenient.avifenc
has some dedicated code to slice an input image into cells, but currently it only does so if the input image has dimensions that are multiples oftile_width
andtile_height
. It might be helpful to improve that code and move it into a new function available in avif.h.output_width
andoutput_height
toavifEncoder
. Could default to 0. \ The advantage of this solution is that it could apply to other scenarios than grids only (provides an API to rescale images at decoding).Option 2.2: keep the same API but fix the issue internally
Basically add a step between
avifEncoderAddImageGrid()
andavifEncoderAddImageInternal()
to convert the "imperfect" grid into a "perfect" grid.Issues:
Somewhat related: https://github.com/MPEGGroup/MIAF/issues/11