Breakthrough / PySceneDetect

:movie_camera: Python and OpenCV-based scene cut/transition detection program & library.
https://www.scenedetect.com/
BSD 3-Clause "New" or "Revised" License
2.97k stars 374 forks source link

[Bug] memory leak #373

Closed HuaZheLei closed 4 months ago

HuaZheLei commented 5 months ago

Description: When I use a 'for' loop to cut a sequence of videos, the memory raise all the time.

Command:

for video_path in video_lists: video = open_video(video_path) scene_manager = SceneManager() scene_manager.add_detector(ContentDetector(threshold=27)) scene_manager.detect_scenes(video, show_progress=False)

Output:

Environment:

boto3 | 1.34.31 | 1.29.1 -- | -- | -- botocore | 1.34.31 | 1.32.1 bzip2 | 1.0.8 | 1.0.8 ca-certificates | 2023.12.12 | 2023.12.12 click | 8.1.7 | 8.1.7 jmespath | 1.0.1 | 1.0.1 libffi | 3.4.4 | 3.4.4 ncurses | 6.4 | 6.4 numpy | 1.26.3 | 1.26.3 objgraph | 3.6.0 |   opencv-python | 4.9.0.80 |   openssl | 3.0.12 | 3.0.12 pip | 23.3.1 | 23.3.1 platformdirs | 4.1.0 | 3.10.0 python | 3.10.13 | 3.12.1 python-dateutil | 2.8.2 | 2.8.2 readline | 8.2 | 8.2 s3transfer | 0.10.0 | 0.7.0 scenedetect | 0.6.2 |   setuptools | 68.2.2 | 68.2.2 six | 1.16.0 | 1.16.0 sqlite | 3.41.2 | 3.41.2 tk | 8.6.12 | 8.6.12 tqdm | 4.66.1 | 4.65.0 tzdata | 2023d | 2023d urllib3 | 2.0.7 | 2.1.0 wheel | 0.41.2 | 0.41.2 xz | 5.4.5 | 5.4.5 zlib | 1.2.13 | 1.2.13 **Media/Files:**
Breakthrough commented 5 months ago

How many videos are you processing roughly? Does it occur if all of the paths in video_lists are the same video? What OS/environment are you running this under, and how are you running the script?

Thank you.

Edit: I was able to verify on Windows x64 at least on my system there is no memory leak with the code pattern you outlined above, this may be something due to the environment.

HuaZheLei commented 5 months ago

How many videos are you processing roughly? Does it occur if all of the paths in video_lists are the same video? What OS/environment are you running this under, and how are you running the script?

Thank you

Thanks for your reply.

  1. I use the script to process 30k videos.
  2. If all of the paths in video_lists are the same video, it still occurs.
  3. I find this situation on Ubuntu and MacOS.
  4. I just use 'python3 xxx.py' in a single thread.
Breakthrough commented 5 months ago

Could you install and try another backend like 'pyav' or 'moviepy'? e.g. run pip install av and open video with video = open_video(video_path, 'pyav')

PySceneDetect is a pure Python library and offloads all video processing to either OpenCV, PyAV, or MoviePy. The next steps would be to narrow down if the memory leak, so trying a different backend will help greatly with that. In the meantime I will setup a VM to test this with locally on Ubuntu.

Can you create and upload a small script that causes the issue? A video clip as well would be helpful to ensure we are testing the same things and I want to see if we can use the exact same script/video. Thanks.

Breakthrough commented 5 months ago

Using the same package versions as you outlined above, on Ubuntu 22.04, I'm running all 3 different backends in a loop the same way you described above. From what I can see memory usage for PySceneDetect+OpenCV and PySceneDetect+MoviePy are steady. As for PyAV, the memory does seem to climb at first, but it is eventually reclaimed and drops again. I don't think PySceneDetect and most of the backends in use have any memory leaks, but am happy to look further into this if you can provide a reproduction.

wjs018 commented 4 months ago

I have previously had this issue with moviepy, but that was a couple years ago and I haven't run things in a loop like this since then. I fixed it by writing things out to file occasionally and invoking the del function on a bunch of variables every X number of iterations.

HuaZheLei commented 4 months ago

Using the same package versions as you outlined above, on Ubuntu 22.04, I'm running all 3 different backends in a loop the same way you described above. From what I can see memory usage for PySceneDetect+OpenCV and PySceneDetect+MoviePy are steady. As for PyAV, the memory does seem to climb at first, but it is eventually reclaimed and drops again. I don't think PySceneDetect and most of the backends in use have any memory leaks, but am happy to look further into this if you can provide a reproduction.

Hi, I find that the code uses cv2.VideoCapture to open a video, but without to release it. When I run a for loop, it may cause the memory raising?

Breakthrough commented 4 months ago

Hi, I find that the code uses cv2.VideoCapture to open a video, but without to release it. When I run a for loop, it may cause the memory raising?

The destructor of a VideoCapture object will call release() which applies to Python as well. That will happen automatically at the end of each loop iteration. Are you able to reproduce this with a different versions of OpenCV?

I recently came across https://github.com/bloomberg/memray which might be useful to find the cause. A memory flame graph would be really helpful to identify what part of code is using the memory.

babyta commented 3 months ago

Hello, I recently wrote a synchronous stripping service, and I also have this problem. The free memory keeps decreasing.

babyta commented 3 months ago

‘ scene_manager.detect_scenes ’ Immediately following the previous answer, the execution of splitting will become slower and slower.