GPU memory issue - Githubissues

jaehyunshinML commented 2 years ago

Hi,

Thanks for sharing this code.

I tried to use my own sample video (mp4), but I've got a GPU memory issue. is there any restriction on the input file format or length?

This is the code I used to test

python inference_realbasicvsr.py configs/realbasicvsr_x4.py checkpoints/RealBasicVSR_x4.pth data/test.mp4 results/demo_001.mp4 --fps=30 --max_seq_len=20

ckkelvinchan commented 2 years ago

How long is your video, and what is the resolution of the video?

jaehyunshinML commented 2 years ago

Hi!! Video was 1080p and 1.5h long!

Thanks

Get Outlook for iOShttps://aka.ms/o0ukef

From: Kelvin C.K. Chan @.> Sent: Thursday, May 19, 2022 4:51:19 AM To: ckkelvinchan/RealBasicVSR @.> Cc: Jaehyun Shin @.>; Author @.> Subject: Re: [ckkelvinchan/RealBasicVSR] GPU memory issue (Issue #49)

[External Email]

How long is your video, and what is the resolution of the video?

— Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fckkelvinchan%2FRealBasicVSR%2Fissues%2F49%23issuecomment-1130387441&data=05%7C01%7Cjaehyun.shin%40akqa.com%7C9f9df6d79bad43c7da9108da38ff6793%7C957cceeb98294dd9a6808ca4d2fe58f0%7C0%7C0%7C637884966847390416%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=06mmUdK91liJ%2FyJjT0dR0YoY6DFnghQRyJyFrxnnu0M%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAKMMHRJ2RLV2ZAFVAKH3KRLVKU32PANCNFSM5VOAOX6Q&data=05%7C01%7Cjaehyun.shin%40akqa.com%7C9f9df6d79bad43c7da9108da38ff6793%7C957cceeb98294dd9a6808ca4d2fe58f0%7C0%7C0%7C637884966847390416%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=JoVNIfHfZ%2BCyDJtBaHnKZ%2BYwg5cfJxcrVamWFvT8DQ8%3D&reserved=0. You are receiving this because you authored the thread.Message ID: @.***>

Privileged/Confidential Information may be contained in this message. If you are not the addressee indicated in this message (or responsible for delivery of the message to such person), you may not copy or deliver this message to anyone. In such case, you should destroy this message and kindly notify the sender by reply email. Please advise immediately if you or your employer does not consent to email for messages of this kind. Opinions, conclusions, and other information in this message that do not relate to the official business of AKQA, shall be understood as neither given nor endorsed by it.

QiFuChina commented 2 years ago

Same issues, first video is 4 secend video with 1080*1920, then I try the data/demo_001.mp4 but it also failed RuntimeError: CUDA out of memory. Tried to allocate 4.35 GiB (GPU 0; 8.00 GiB total capacity; 4.37 GiB already allocated; 0 bytes free; 6.06 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF That means GPU 8G isn't enough to run this? Because I restart computer and check GPU available memory but it doesn't work

ckkelvinchan commented 2 years ago

@jaehyunshinML Since it is 1.5h long, the number of frames is huge. Therefore the network is unable to handle such a long video. You can set max-seq-len to a smaller number to see, as in here

ckkelvinchan commented 2 years ago

@QiFuChina In this case, I guess 8G is not enough for that. Did you try reducing max-see-len as mentioned above?

QiFuChina commented 2 years ago

@QiFuChina In this case, I guess 8G is not enough for that. Did you try reducing max-see-len as mentioned above?

Thanks, after change max-see-len to enough value(from 24 to 1), I can run the program with demo_001.mp4 and it works. Then I want to collect result about different videos length so I edit another video that 5min, 960*540, 158MB max-see-len=1 to run but then I get this error _RuntimeError: [enforce fail at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\c10\core\impl\alloccpu.cpp:81] data. DefaultCPUAllocator: not enough memory: you tried to allocate 99532800 bytes. then I change to max-see-len=20 and result is
DefaultCPUAllocator: not enough memory: you tried to allocate 55931212800 bytes. My device info below: OS win 10 CPU Intel i9-10900k Memory 64G GPU RTX 3070ti 8G No CUDA error but memory error so I'm confused with these error

ckkelvinchan commented 2 years ago

I think storing images of 5 mins video may not be enough even when in RAM. Since max-seq-len=1 is used. You can save them as png separately, and convert to mp4 later.

JiayunLi-3E8 commented 1 year ago

@ckkelvinchan hey, ckkelvinchan.

I've refactored the structure of inference_realbasicvsr.py for memory issues and committed to my fork, hope you read and test them and consider merging into your repo.

In _fixMem, I set --max_seq_len as the limit for loading images into RAM, and added the --split parameter to split the image for processing to solve the problem of insufficient GPU memory, and the introduction of multi-threading makes it work more tightly.
_withRich adds a progress bar base on _fixMem, which is helpful when dealing with large numbers of images.

If you decide to accept my contribution please reply or email me, thanks.

JiayunLi-3E8 commented 1 year ago

@ckkelvinchan hey, ckkelvinchan.

I've refactored the structure of inference_realbasicvsr.py for memory issues and committed to my fork...

AGenchev commented 1 year ago

@JiayunLi-3E8 does it output exactly the same frames as the original code with your modifications ? I mean, if there is motion and something moves from one part of the image into another part of the image, the model will not "see" it. Example if the image sequence is split into 2 halves and processed independently and license plate moves from one half to the other.

JiayunLi-3E8 commented 1 year ago

@JiayunLi-3E8 does it output exactly the same frames as the original code with your modifications ? I mean, if there is motion and something moves from one part of the image into another part of the image, the model will not "see" it if the image sequence is split into 2 halves and processed independently.

@ckkelvinchan After my tests, the images processed by the -split function do have some stitching traces, but I didn't notice it until I saw your question.

Here is my test file, where the output file is transcoded by ffmpeg for smaller size: https://wwc.lanzoub.com/b03pacymh password: e8gy

This is an optional parameter, and splitting is not done by default, but it may be an chance for low performance graphics cards.

AGenchev commented 1 year ago

@JiayunLi-3E8 I see - you have split it in 3 bands. Trying to optimize is good thing. 3070 is not low performance, it is low memory. I see another issue though - in your output frame at s 38.000 frame 2280 the nose is deformed. With original code I run out of RAM (didn't check why) so I'll try to process with my card with your code, same settings with split=1 to see whether it occurs. For now it works, occupies 19758MB Video RAM, 3.6 G CPU RAM, so you really enabled me to run it. The code uses several NN, if for example the optical flow is computed outside of GPU, the video memory usage can be reduced. I am not sure how fast the optical flow can compute on CPU though. It might become a bottleneck Edit: I got the same defect with the nose. Tried to run BasicVSR++ on input video, but it consumes even more cuda memory. I need 48GB GPU LoL!

JiayunLi-3E8 commented 1 year ago

@JiayunLi-3E8 I see - you have split it in 3 bands. Trying to optimize is good thing. 3070 is not low performance, it is low memory. I see another issue though - in your output frame at s 38.000 frame 2280 the nose is deformed. With original code I run out of RAM (didn't check why) so I'll try to process with my card with your code, same settings with split=1 to see whether it occurs. For now it works, occupies 19758MB Video RAM, 3.6 G CPU RAM, so you really enabled me to run it. The code uses several NN, if for example the optical flow is computed outside of GPU, the video memory usage can be reduced. I am not sure how fast the optical flow can compute on CPU though. It might become a bottleneck Edit: I got the same defect with the nose. Tried to run BasicVSR++ on input video, but it consumes even more cuda memory. I need 48GB GPU LoL!

I see, maybe the problem about nose is in the model. The -max_seq_len affects both RAM and GPU RAM, but -split affects GPU RAM only.

Dylan-Jinx commented 1 year ago

Hello,I have the same problem, my video file is 480*320 and video length is 4s, CUDA Out of Memory still appears.and then I set max_seq_len=minVal , it still out of memory.but the demo video file can success running. @ckkelvinchan @jaehyunshinML

zhu2bowen commented 1 year ago

作者的demo脚本不适合处理较长视频，建议处理为单帧推理。

ckkelvinchan / RealBasicVSR

GPU memory issue #49