PyAV-Org / PyAV

Pythonic bindings for FFmpeg's libraries.
https://pyav.basswood-io.com/
BSD 3-Clause "New" or "Revised" License
2.52k stars 365 forks source link

Memory leak while reading frames from preloaded video buffer #317

Closed igor-kondratiev closed 5 years ago

igor-kondratiev commented 6 years ago

This leak can easily be reproduced with this script

import io

import av

ITERATIONS = 1000
INPUT_SAMPLE_PATH = "/path/to/any/mp4"

def main():
    with open(INPUT_SAMPLE_PATH, "rb") as f:
        input_data = f.read()

    for i in range(ITERATIONS):
        print(f"Iteration {i + 1} started")

        buffer = io.BytesIO(input_data)
        container = av.open(buffer)

        frame = next(container.decode(video=0))

    input("Press enter to exit...")

if __name__ == '__main__':
    main()

Increasing number of iterations will cause memory usage constant rising. I'm using version 0.3.3 but 0.4.0 is affected as well. Python version: 3.6.5

I've made some investigations and found actual leaking place. The problem is ContainerProxy::__dealloc__ method. In case of python file-like buffer there will be two extra objects allocated - self.iocontext and self.buffer (spoiler: they are both leaking). Next code - self.ptr.pb = self.iocontext will force ffmpeg to set AVFMT_FLAG_CUSTOM_IO (this happens here). Another important thing is that avformat_close_input will not free buffer and iocontext if AVFMT_FLAG_CUSTOM_IO flag is set - avformat_close_input code. As a result, both self.iocontext and self.iocontext.buffer will not be deallocated.

You could also use valgrind with provided script to verify described behavior. Important note is that valgrind will not point to the buffer that is being allocated in ContainerProxy::__init__. libAV can reallocate provided buffer for some reasons, so valgrind will show only the last reallocation call-stack.

I've made a fix for version 0.3.3 (here is link to commit) but there is no release branch for this version and I can't create a PR. Are there any chances to release a patch for version 0.3.3?

The code in ContainerProxy::__dealloc__ was changed in current develop version, but it will not be difficult to adapt this fix for current version. Do you have any plans about dates for the next PyAV release where this fix could be included?

Thank you in advance!

mikeboers commented 6 years ago

Thanks for being so thorough!

I don't have dates. I'm hoping I can spend a day on PyAV in the next week or two if work at my primary clients' works out.

e3oroush commented 6 years ago

@igor-kondratiev I have used your branch, but the problem remained. I still have memory leakage. I had to call garbage collector manually to release memory too, is it some sort of bug?

jlaine commented 6 years ago

@igor-kondratiev Could you please give #434 a spin? In my tests it eliminated the memory leak identified by valgrind. Thanks for describing the issue in so much detail!

jlaine commented 6 years ago

@e-soroush garbage collection is a different topic, and not a bug : it's just about when the memory is reclaimed. If you do del container, the destructor should be immediately called and you'll reclaim your memory.

RomainSabathe commented 6 years ago

Hey, thanks everyone for the input and the fix. Similar to @e3oroush, I'm still having leaks, or so I think. Based on @igor-kondratiev 's example, here's how to reproduce:

import io

import av

ITERATIONS = 1000
INPUT_SAMPLE_PATH = "/path/to/any/mp4"

def main():
    with open(INPUT_SAMPLE_PATH, "rb") as f:
        input_data = f.read()

    for i in range(ITERATIONS):
        print(f"Iteration {i + 1} started")

        buffer = io.BytesIO(input_data)
        container = av.open(buffer)

        frames = [frame for frame in container.decode(video=0)]
        # Optional; just to show that it doesn't solve the issue
        del frames
        del container

    input("Press enter to exit...")

if __name__ == '__main__':
    main()

The leak is observed even when using 6.0.1.dev0. Am I doing this right?

orf commented 6 years ago

For reference, this code does not seem to trigger the issue:

import av
import io
import psutil
import gc

us = psutil.Process()

with open('video.mp4', 'rb') as fd:
    content = fd.read()

for i in range(1000):
    gc.collect()
    f = io.BytesIO(content)
    # open video bytestream with PyAv
    video = av.open(f)
    frames = [
        frame
        for packet in video.demux()
        for frame in packet.decode()
    ]
    print(us.memory_percent())

input('done')
orf commented 6 years ago

I've done some digging into this, it appears to be related to a weird interaction between this library and the garbage collector. Adding a gc.collect() after each iteration completely removes the memory leak, even without the del statements.

If you have this kind of small loop:

for i in range(1000):
    print(f"Iteration {i + 1} started")
    buffer = io.BytesIO(content)
    container = av.open(buffer)
    frames = [frame for packet in container.demux() for frame in packet.decode()]
    gc.collect()
    print(gc.get_stats())
    print(us.memory_percent())

You can see something like this in the output:

Iteration 18 started
[{'collections': 65, 'collected': 320, 'uncollectable': 0}, {'collections': 2, 'collected': 28, 'uncollectable': 0}, {'collections': 18, 'collected': 25755, 'uncollectable': 0}]
3.2618045806884766
Iteration 19 started
[{'collections': 67, 'collected': 320, 'uncollectable': 0}, {'collections': 2, 'collected': 28, 'uncollectable': 0}, {'collections': 19, 'collected': 27270, 'uncollectable': 0}]
3.2618045806884766
Iteration 20 started
[{'collections': 69, 'collected': 320, 'uncollectable': 0}, {'collections': 2, 'collected': 28, 'uncollectable': 0}, {'collections': 20, 'collected': 28785, 'uncollectable': 0}]
3.261852264404297

This is really weird because all of the serious collections seem to come from generation 2 (the last dictionary), removing over a thousand objects an iteration. Generation 2 is supposed to be for the longest lived objects that don't need pruning that often.

So, I'm guessing that:

  1. Objects holding on to large memory references are ending up in generation 2 way, way too fast
  2. This means that they are only freed occasionally.
  3. The __del__ method is not executed, and memory is not freed. So, memory leak!

On top of this the garbage collector has no visibility into the Cython structures, so I guess it has no idea that these seemingly small objects are really large.

I suggest we add an __enter__ and __exit__ methods to the Container?

with av.open(...) as video:
   ...

This would be good anyway I think, because relying on the GC is not fantastic!

Mofef commented 5 years ago

I still have a memory leak even with the code given in https://github.com/mikeboers/PyAV/issues/317#issuecomment-430960322 memory consumption is growing quite dramatically. output:

1.00917691564
1.58076002958
2.14917120615
2.71771593798
3.28846433156
3.85700906339
4.42555379522
4.9963021888
5.56484692063
6.13339165246
6.70414004604
7.27268477787
7.84343317145
8.41197790328
8.98052263511
9.55127102869
10.1198157605
...

So just adding gc.collect() in the loop doesn't seem to solve the issue here. av.__version__ == '6.1.2'

Mofef commented 5 years ago

In case it helps: I ran the loop in pdb. exited to a pdb promt via KeyboardInterupt and did:

ipdb> gc.collect()
19
ipdb> gc.collect()
7
ipdb> gc.collect()
7
ipdb> gc.collect()
7  #umkay...?
ipdb> objgraph.show_most_common_types(limit=20)
function                   10474
tuple                      8543
dict                       8288
list                       3738
_Binding                   3132
weakref                    2218
cell                       1533
builtin_function_or_method 1470
wrapper_descriptor         1424
set                        1179
getset_descriptor          1060
method_descriptor          870
type                       699
WeakSet                    495
property                   479
module                     475
VideoFormatComponent       438
VideoPlane                 435
member_descriptor          373
deque                      362
ipdb> objgraph.by_type('VideoFormatComponent')[222]
<av.video.format.VideoFormatComponent object at 0x7f4712602710>
ipdb> objgraph.show_backrefs([objgraph.by_type('VideoFormatComponent')[222]], max_depth=10)
Graph written to /tmp/objgraph-NXJAAW.dot (114 nodes)
Graph viewer (xdot) not found, generating a png instead
Image generated as /tmp/objgraph-NXJAAW.png

Then in a shell I converted /tmp/objgraph-NXJAAW.dot to a pdf objgraph-NXJAAW.pdf, opened it and searched for "VideoFormatComponent". I'm honestly way out of my comfort zone, but from what understand circles are a bad thing here. I marked two of them, but there are a couple more. circles_highlighted

Mofef commented 5 years ago

@mikeboers not sure if a comment in a closed issue is enough to get your attention... :)

keko950 commented 5 years ago

This is a real problem. Would be nice if someone do something with it.

mikeboers commented 5 years ago

I'm going to re-open this, mainly because of @Mofef's efforts. I reserve the right to close it and/or make a new one.

@keko950, your comment is not helpful. I'd appreciate if you would be constructive instead of passive-aggressive complaining.

keko950 commented 5 years ago

I'd appreciate if you would be constructive instead of passive-aggressive complaining.

Didnt mean to be disrespectful with you or anybody! Just tried to get someone's attention at this bug, because it is happening and the only current solution is call to the garbage collector manually, sorry!

victorhcm commented 5 years ago

At a first glance, it seems gc.collect() did help in my case, capturing a streaming from a single 4K camera. @mikeboers do you have any plans on how to troubleshoot it? If you have any pointers, we'd be glad to help.

mikeboers commented 5 years ago

@victorhcm Not really.

@jlaine went through a heroic effort with Valgrind to remove a ton of "real" memory leaks. Given the behaviours discussed above, it really sounds like it isn't a memory leak as much as the garbage collector not collecting.

This feels similar to the origins of the ContainerProxy, and now #496 to remove it. There are reference cycles that are not trivial to break, and it has a hard time with them.

Gotta figure out why the collector isn't identifying things, either at all or until we manually call them. Then either make some changes so that isn't triggered, or start overriding tp_traverse and tp_clear in order to provide our own intelligence.

Thats a lot to say "not really". 😛

jlaine commented 5 years ago

Back to the original problem description: I believe the root cause was the absence of a close() method on input containers. There is now a close method on containers, and I see much lower memory usage if I do:


import io

import av

ITERATIONS = 1000
INPUT_SAMPLE_PATH = "/path/to/any/mp4"

def main():
    with open(INPUT_SAMPLE_PATH, "rb") as f:
        input_data = f.read()

    for i in range(ITERATIONS):
        print(f"Iteration {i + 1} started")

        container = av.open(io.BytesIO(input_data))
        frame = next(container.decode(video=0))
        container.close()

    input("Press enter to exit...")

if __name__ == '__main__':
    main()

To make things even easier I have submitted https://github.com/mikeboers/PyAV/pull/515 which will allow you do do:

with av.open(io.BytesIO(input_data)) as container:
    frame = next(container.decode(video=0))
jlaine commented 5 years ago

I am going to close this issue as I believe the original problem is solved with the introduction of InputContainer.close().

Let's continue the discussion on making frame objects more gc-friendly in https://github.com/mikeboers/PyAV/issues/517