AOMediaCodec / libavif

libavif - Library for encoding and decoding .avif files
Other
1.45k stars 190 forks source link

Wish list: Large image encoding (via image grid?) #331

Closed novomesk closed 3 years ago

novomesk commented 3 years ago

I observed that some people like very large images (even larger than 100 mega pixels) but encoding big images to AVIF is very memory hungry process. For example I was able to encode 10000x10000 image on my computer, but with higher dimensions there is lot of swap use or the encoding process runs out memory.

Is the image grid solution for encoding large images? The smaller regions would be not so memory-consuming as entire image.

Maybe the image grid encoding would be another breaking change and maybe not. Maybe the encoder API can remain mostly unchanged - user pass entire image and libavif create grid automatically when image is above specified size. Or there will be a new API - user splits image and pass image fragments to libavif one after another.

This is not a top urgent feature now for me but I am writing so it could be considered/planned for future.

joedrago commented 3 years ago

Image grid support for avifEncoder is a very reasonable request, but I'm not likely to personally add it for a while. avifDecoder can already decode grid images.

Out of curiosity, how much does enabling the encoder's tiling mechanism help? That functionality is already exposed and available to try.

novomesk commented 3 years ago

Encoders create tiles automatically because maximum tile width is 4k and maximum size 9 mega pixel.

Even if you don't request tiles for 21 MP photo from digital camera, it will have 4 tiles at least.

joedrago commented 3 years ago

Sure, I'm saying if you turned up the tiling, do you see an improvement in encoding time?

novomesk commented 3 years ago

I tried few months ago. No difference when I encoded 21Mp picture using 4 threads because tiles were always 4. I saw speedup only on smaller images, something like 10-20% relatively, but small images are generally fast.

Uch1haOb1to commented 3 years ago

If I understand the work of tiles correctly, then they at least do not reduce memory consumption. Earlier, before optimizing libaom, I also saw a lack of memory and use of paging, but now the memory is consumed significantly less. But still, the pictures can be such that there is not enough memory. You may add a parameter that is responsible for the amount of memory available for encoding, and when there is not enough memory for normal encoding, then divide the picture. Also, you may make the encoder set this parameter by default depending on the available memory or depending on the total amount of memory.

0xC0000054 commented 3 years ago

If I understand the work of tiles correctly, then they at least do not reduce memory consumption.

Tiles can refer to both the encoder's tiling mechanism and the individual images that make up an image grid. The encoder's tiling mechanism does not reduce memory usage, creating an image grid can significantly reduce memory usage.

I implemented image grid support in my Paint.NET AVIF plugin, https://github.com/0xC0000054/pdn-avif. When saving an image it will either reuse the existing image grid tile size, or attempt to guess the best tile size based on the compression speed. Because AOM version 2.0.0 does not allow an encode to be aborted using an image grid also improves the UI responsiveness when changing settings, smaller images will encode faster.

chinakook commented 3 years ago

Large image encoding with low memory footprint is very important as the rapid growing of pixels of cameras of smart phones.

chinakook commented 3 years ago

If I understand the work of tiles correctly, then they at least do not reduce memory consumption.

Tiles can refer to both the encoder's tiling mechanism and the individual images that make up an image grid. The encoder's tiling mechanism does not reduce memory usage, creating an image grid can significantly reduce memory usage.

I implemented image grid support in my Paint.NET AVIF plugin, https://github.com/0xC0000054/pdn-avif. When saving an image it will either reuse the existing image grid tile size, or attempt to guess the best tile size based on the compression speed. Because AOM version 2.0.0 does not allow an encode to be aborted using an image grid also improves the UI responsiveness when changing settings, smaller images will encode faster.

I have tried your paint.net plugin. It's very great! It's so memory efficient for big image. Thanks.

wantehchang commented 3 years ago

0xC0000054 wrote:

Tiles can refer to both the encoder's tiling mechanism and the individual images that make up an image grid. The encoder's tiling mechanism does not reduce memory usage, creating an image grid can significantly reduce memory usage.

We recently made two changes that reduced the memory usage of libavif and libaom.

The first change is to set libaom encoder's g_lag_in_frames configuration setting to 1 when encoding single images. This can be done in libavif and turns out to be difficult to have libaom do this automatically: https://github.com/AOMediaCodec/libavif/commit/3fcc555000fffc3172db4c19c412eea7fb1d46a3

The second change is to not allocate the buffers for temporal filtering in libaom if g_lag_in_frames <= 1: https://aomedia-review.googlesource.com/c/aom/+/114382

If you can check out the current master branch of libaom, you will see reduced memory usage. Unfortunately it is rather difficult to backport the second change to libaom's 2.0.x branch, because the current master branch has become very different from the 2.0.x branch in the relevant code. I have actually prepared a backporting patch and am testing it now, but I am worried that I may miss some changes that the memory usage reduction patch implicitly depend on.

novomesk commented 3 years ago

When we encode the image grid, what are the limitations for the dimensions of the grid's cells? Or what are optimal/recommended dimensions?

wantehchang commented 3 years ago

I heard that photos taken on iPhones are HEIF image grids. If you have an iPhone, you can inspect its HEIF photos and find the dimensions of the grid's cells.

I would guess the grid cell dimensions should be at least 512x512.

This issue has been fixed by a series of commits from aec9cffd8890e7ae16d3e1a853ba21939b5cd1f5 to 2b7f04e95b56fc91b4428221d6e457e33ba2e05d.

joedrago commented 3 years ago

According to MIAF, the grid cells must be at least 64 on each dimension, and all cells must all be the same size. I can't remember top-of-head if there are other restrictions, but I think there is something about the dimensions being even as well too.

wantehchang commented 3 years ago

Thanks, Joe. Now I see my earlier comment is confusing:

I would guess the grid cell dimensions should be at least 512x512.

What I meant is that to get good compression efficiency, grid cell dimensions should not be too small; I guess they should be at least 512x512. I did not mean that 512x512 is the minimum grid cell dimensions required by the specs.

chinakook commented 3 years ago

@wantehchang The new version with --grid ate much memory too, so I think it's not solved. The old paint.net version had extremely low memory footprint when comparing to the --gird version.

wantehchang commented 3 years ago

chinakook: Thank you for testing grid image encoding in libavif. (I closed this issue because I thought this issue was a feature request for grid image encoding.) If you could post the command lines you used for avifenc and Paint.NET, Joe will be able to reproduce and take a look.

Nicholas Hayes (0xC0000054): Since you are familiar with libavif, I'd appreciate it if you could take a look and see why libavif's grid image encoding does not reduce memory consumption. Thank you for your help!

0xC0000054 commented 3 years ago

If you could post the command lines you used for avifenc and Paint.NET, Joe will be able to reproduce and take a look.

My Paint.NET plugin does not use libavif, the AVIF container handling is my own C# code.

After examining the avifenc code it looks like the main difference between it and the code I use is how single images are handled. My code attempts to automatically guess the best image grid size for a single image, and libavif requires the user to specify it. But there may be other issues that I did not notice.

I wrote a technical post describing the image grid implementation my Paint.NET plugin uses. Quoting the algorithm summary from the above linked post:

  1. Pick a maximum tile size based on the compression speed preset.
  2. Check if the full image size is greater than the maximum tile size.
  3. If the image width or height is greater than the maximum tile size, pick the closest evenly divisible tile size to the maximum tile size.

Without seeing chinakook's avifenc command line I do not know what the problem is.

I heard that photos taken on iPhones are HEIF image grids. If you have an iPhone, you can inspect its HEIF photos and find the dimensions of the grid's cells. I would guess the grid cell dimensions should be at least 512x512.

From the images I have seen, iPhones encode a 4032x3024 pixel image as an 8x6 grid of 504x504 pixel cells.

wantehchang commented 3 years ago

Nicolas: Thanks a lot for taking a look!

I prepared a patch for libaom's "applejack" branch that should reduce memory consumption further when g_lag_in_frames <= 1. (I know you set g_lag_in_frames to 1 in your Paint.NET plugin.) The patch can be found in https://crbug.com/aomedia/2872. It would be good if you could test it. Just check out applejack instead of v2.0.1 in your libaom source tree and apply the patch aomedia-2872-applejack.txt to the source tree. Thanks!

0xC0000054 commented 3 years ago

I looked into this issue a little more.

I built the avifenc from the latest git version, and tried to encode a 4032x3024 pixel image.

When the image was encoded without using an image grid libavif used approximately 2 GB of memory:

avifenc --yuv 422 --min 10 --max 10 --speed 6 image.png image.avif

When encoding as an 8x6 grid of 504x504 pixel cells libavif used approximately 6 GB of memory:

avifenc --yuv 422 --min 10 --max 10 --speed 6 --grid 8x6 image.png image.avif

avifImageSplitGrid allocates new memory for the cell pixel data instead of aliasing it to the buffer used by the main image, but I do not think that can explain more than doubling the memory usage.

My Paint.NET plugin aliases the cell memory buffer to the main image, and I also create a new AOM encoder instance for each cell image. The AOM encoder is located in a separate DLL, so creating a new encoder instance for each image made calling it much simpler.

wantehchang commented 3 years ago

Nicholas: Thank you very much for looking into this!

I guess it may have to do with the lifetime of the encoder instances. In your Paint.NET plugin, the encoder instances are ScopedAOMEncode local variables in the DoOnePass() method. The encoder instances are destroyed when DoOnePass() returns.

In libavif, the encoder instances are destroyed in the avifEncoderDataDestroy() function:

static void avifEncoderDataDestroy(avifEncoderData * data)
{
    for (uint32_t i = 0; i < data->items.count; ++i) {
        avifEncoderItem * item = &data->items.item[i];
        if (item->codec) {
            avifCodecDestroy(item->codec);
        }
        avifCodecEncodeOutputDestroy(item->encodeOutput);
        avifRWDataFree(&item->metadataPayload);
        avifArrayDestroy(&item->mdatFixups);
    }
    avifImageDestroy(data->imageMetadata);
    avifArrayDestroy(&data->items);
    avifArrayDestroy(&data->frames);
    avifFree(data);
}

In your example of --grid 8x6, I suspect the 48 encoder instances are destroyed simultaneously.

0xC0000054 commented 3 years ago

I guess it may have to do with the lifetime of the encoder instances.

That makes sense.

Looking at the code it appears that each item has its own codec instance, it may be possible to free the codec instance after each image grid cell is encoded. Although, I do not know whether the memory savings are worth the additional complexity that would add to the image grid encoding code.

joedrago commented 3 years ago

Hmm, we could maybe add some plumbing around AVIF_ADD_IMAGE_FLAG_SINGLE where we pre-Finish and destroy the codec after each tile and keep track of the fact that we did it. That could save some memory.

wantehchang commented 3 years ago

0xC0000054 wrote:

avifImageSplitGrid allocates new memory for the cell pixel data instead of aliasing it to the buffer used by the main image, but I do not think that can explain more than doubling the memory usage.

I wrote https://github.com/AOMediaCodec/libavif/pull/454 to fix this issue.

joedrago commented 3 years ago

libaom should now clean up ASAP as it encodes each cell's color and alpha payloads serially, so hopefully this works better now. Let me know.