fdintino / pillow-avif-plugin

A pillow plugin that adds avif support via libavif
BSD 2-Clause "Simplified" License
77 stars 12 forks source link

SVT-AV1 codec is using too much memory to process images #39

Open RaphaelVRossi opened 7 months ago

RaphaelVRossi commented 7 months ago

Hey @fdintino,

I did some tests with pillow-avif-plugin and svt-av1 codec and after it the application not free the memory used. For each processed AVIF image the memory increase until reach the memory limit.

To replicate the test, please use this gist and follow the steps below:

docker build -t test-avif .

docker run -m 2g --cpus 1 -it -v "$(pwd):/app" test-avif2 bash

cd /app
wget https://live.staticflickr.com/1379/540719764_cddd076c3b_o_d.jpg  -O bug.jpg

/venv/bin/python3 process.py

You can use docker stats to monitore docker resources.

Could you please help me?! 🆘

Thanks for all your support. 🎉

RaphaelVRossi commented 7 months ago

After an analysis, I managed to find where memory usage is increasing, this may help to find where is the problem.

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
   123     67.9 MiB     67.9 MiB           1   @profile
   124                                         def _save(im, fp, filename, save_all=False):
   [...]
   233                                                         # Append the frame to the animation encoder
   234    714.6 MiB    633.8 MiB           2                   enc.add(
   235     80.9 MiB     12.9 MiB           1                       frame.tobytes("raw", rawmode),
   236     80.9 MiB      0.0 MiB           1                       frame_dur,
   237     80.9 MiB      0.0 MiB           1                       frame.size[0],
   238     80.9 MiB      0.0 MiB           1                       frame.size[1],
   239     80.9 MiB      0.0 MiB           1                       rawmode,
   240     80.9 MiB      0.0 MiB           1                       is_single_frame,
   241                                                         )   
   [...]
fdintino commented 7 months ago

SVT-AV1 unfortunately uses a lot of memory when encoding an image. And when you're encoding 10 images in parallel, as you are in your example script, it will use 10x that amount. You'll find that the memory gets released after all of the encodes are finished, though, so there isn't a memory leak here.

Using a higher SVT-AV1 preset / encoder mode (speed in libavif) will modestly reduce memory usage. You can use the highest preset mode by passing the kwarg speed=10 to the save method. Though there is a bug in libavif that caps this value at 8 for the SVT codec, and the memory reduction is even more modest at 8 relative to the default. I've opened https://github.com/AOMediaCodec/libavif/pull/1824 with a fix for that.

RaphaelVRossi commented 7 months ago

Thank you @fdintino !

Using speed=10 it really slightly reduced memory usage.

But when enc.finish() was called, no memory was released. You can see below:

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
[...]
   234    587.9 MiB    506.7 MiB           2                   enc.add(
   235     81.2 MiB     10.9 MiB           1                       frame.tobytes("raw", rawmode),
   236     81.2 MiB      0.0 MiB           1                       frame_dur,
   237     81.2 MiB      0.0 MiB           1                       frame.size[0],
   238     81.2 MiB      0.0 MiB           1                       frame.size[1],
   239     81.2 MiB      0.0 MiB           1                       rawmode,
   240     81.2 MiB      0.0 MiB           1                       is_single_frame,
   241                                                         )
[...]
   252                                             # Get the final output from the encoder
   253    641.4 MiB     53.6 MiB           1       data = enc.finish()
   254    641.4 MiB      0.0 MiB           1       if data is None:
   255                                                 raise OSError("cannot write file as AVIF (encoder returned None)")
   256
   257    641.4 MiB      0.0 MiB           1       fp.write(data)
[...]

Is that the correct behaviour?

fdintino commented 7 months ago

Yes, the enc object needs to be garbage collected. That happens when the garbage collector runs after it has gone out of scope. Once outside of that function you could call gc.collect() to force it and observe the memory consumption after that.

RaphaelVRossi commented 7 months ago

even force a gc.collect() call the memory was not released.

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    10     58.2 MiB     58.2 MiB           1   @profile
    11                                         async def process(i):
    12     58.2 MiB      0.0 MiB           1       print("process", i)
    13     58.2 MiB      0.0 MiB           2       await asyncio.sleep(1)
    14    643.8 MiB      0.0 MiB           2       with open("bug.jpg", "rb") as file:
    15    643.8 MiB      1.7 MiB           2           with Image.open(file) as image:
    16    643.8 MiB    584.0 MiB           1               image.save("bug.avif", "AVIF", codec="svt", quality=55, speed=10)
    17
    18    643.8 MiB      0.0 MiB           1       gc.collect()
Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    20     56.0 MiB     56.0 MiB           1   @profile
    21                                         async def main():
    22     56.0 MiB      0.0 MiB           1       print("running")
    23    634.4 MiB    578.4 MiB           5       await asyncio.gather(*[process(i) for i in range(1)])
    24    634.4 MiB      0.0 MiB           1       gc.collect()
    25    634.4 MiB      0.0 MiB           1       print("finish")
    26    634.4 MiB      0.0 MiB           1       time.sleep(10)
fdintino commented 7 months ago

When running your script I've been using psutil to monitor memory usage of the running process. You can print out how many MB are being reserved for the process on any given line with:

print(psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2)

I don't know much about the profile decorator you're using but I would try to rule that out as a cause of objects not getting released from memory.

RaphaelVRossi commented 7 months ago

I used this lib to profile memory https://pypi.org/project/memory-profiler/

But now I removed this profile decorator and use psutil to measure the memory.

python process.py
running
process 0
54.60546875
[log from svt]
631.2578125
finish

using this version of the script:

I'm currently running docker with Mac M1 and platform linux/amd64

import time
import asyncio

from PIL import Image

import pillow_avif
import gc
import psutil
import os

async def process(i):
    print("process", i)
    print(psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2)
    await asyncio.sleep(1)
    with open("bug.jpg", "rb") as file:
        with Image.open(file) as image:
            image.save("bug.avif", "AVIF", codec="svt", quality=55, speed=10)

async def main():
    print("running")
    await asyncio.gather(*[process(i) for i in range(1)])
    print(psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2)
    gc.collect()
    print(psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2)
    print("finish")
    time.sleep(10)

loop = asyncio.get_event_loop()
loop.run_until_complete(main())
loop.close()
RaphaelVRossi commented 7 months ago

When you run this script, Is the memory being deallocated after processing ends? Or just when the script ends?

fdintino commented 7 months ago

Oh right. So, the reason you're seeing that has to do with how resident memory is allocated and freed on macOS. If you were to change the end of your main function there to be:

    print(psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2)
    print("finish")
    time.sleep(20)
    print(psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2)

and then, during the 20 seconds when it's sleeping, you forced a process to wire a large amount of memory (this is 30GB, adjust to whatever would exhaust your available memory):

python -c 'import time; x = bytearray(1024*1024*1000*30); time.sleep(60)'

that should cause the OS to release the resident memory from the process, and so you ought to see a much smaller number printed at the end of main().

RaphaelVRossi commented 7 months ago

after some another tests, I built pillow-avif-plugin using libsvtav1enc1 1.4.1+dfsg-1.

This dramatic decrease the amount memory used to process an AVIF image and able to reuse this resident memory while processing in parallel.

RaphaelVRossi commented 7 months ago

My mistake, I tested with docker plataform linux/arm64 but my production environment is linux/x86_64.

With linux/arm64 the libsvtav1enc1 1.4.1+dfsg-1 used memory in a more efficient way, but on the other hand linux/x86_64 use memory "without" deallocate.