"CUDA out of memory" for large video file

m0o0scar commented 1 year ago

Hi. Thanks for this nice work!! I'm trying to deflicker my video on Google Colab. For video files with relatively small dimensions (640x640), I can get the code running with no issue. But when I try with larger input video (1280x1280), I'm getting: RuntimeError: CUDA out of memory. Tried to allocate 30.00 MiB (GPU 0; 14.75 GiB total capacity; 13.69 GiB already allocated; 6.81 MiB free; 13.98 GiB reserved in total by PyTorch):

Link to the Google Colab notebook

Full code block output here

``` /content/All-In-One-Deflicker/data/test video-depth-720.mov video-depth-720.mov(video/quicktime) - 2035246 bytes, last modified: 3/14/2023 - 100% done Saving video-depth-720.mov to video-depth-720.mov /content/All-In-One-Deflicker Namespace(ckpt_filter='./pretrained_weights/neural_filter.pth', ckpt_local='./pretrained_weights/local_refinement_net.pth', fps=10, gpu=0, video_frame_folder=None, video_name='data/test/video-depth-720.mov') ffmpeg -i data/test/video-depth-720.mov -vf fps=10 -start_number 0 ./data/test/video-depth-720/%05d.png ffmpeg version 4.0.2 Copyright (c) 2000-2018 the FFmpeg developers built with gcc 4.8.2 (GCC) 20140120 (Red Hat 4.8.2-15) configuration: --prefix=/home/conda/feedstock_root/build_artifacts/ffmpeg_1539667330082/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac --disable-doc --disable-openssl --enable-shared --enable-static --extra-cflags='-Wall -g -m64 -pipe -O3 -march=x86-64 -fPIC' --extra-cxxflags='-Wall -g -m64 -pipe -O3 -march=x86-64 -fPIC' --extra-libs='-lpthread -lm -lz' --enable-zlib --enable-pic --enable-pthreads --enable-gpl --enable-version3 --enable-hardcoded-tables --enable-avresample --enable-libfreetype --enable-gnutls --enable-libx264 --enable-libopenh264 libavutil 56. 14.100 / 56. 14.100 libavcodec 58. 18.100 / 58. 18.100 libavformat 58. 12.100 / 58. 12.100 libavdevice 58. 3.100 / 58. 3.100 libavfilter 7. 16.100 / 7. 16.100 libavresample 4. 0. 0 / 4. 0. 0 libswscale 5. 1.100 / 5. 1.100 libswresample 3. 1.100 / 3. 1.100 libpostproc 55. 1.100 / 55. 1.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'data/test/video-depth-720.mov': Metadata: major_brand : qt minor_version : 0 compatible_brands: qt creation_time : 2023-03-14T06:51:47.000000Z Duration: 00:00:04.03, start: 0.000000, bitrate: 4036 kb/s Stream #0:0(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 253 kb/s (default) Metadata: creation_time : 2023-03-14T06:51:47.000000Z handler_name : Core Media Data Handler Stream #0:1(eng): Video: h264 (Main) (avc1 / 0x31637661), yuv420p(tv, bt709), 1276x1280, 3772 kb/s, 30 fps, 30 tbr, 30k tbn, 60k tbc (default) Metadata: creation_time : 2023-03-14T06:51:47.000000Z handler_name : Core Media Data Handler encoder : H.264 Stream mapping: Stream #0:1 -> #0:0 (h264 (native) -> png (native)) Press [q] to stop, [?] for help Output #0, image2, to './data/test/video-depth-720/%05d.png': Metadata: major_brand : qt minor_version : 0 compatible_brands: qt encoder : Lavf58.12.100 Stream #0:0(eng): Video: png, rgb24, 1276x1280, q=2-31, 200 kb/s, 10 fps, 10 tbn, 10 tbc (default) Metadata: creation_time : 2023-03-14T06:51:47.000000Z handler_name : Core Media Data Handler encoder : Lavc58.18.100 png frame= 40 fps= 29 q=-0.0 Lsize=N/A time=00:00:04.00 bitrate=N/A speed=2.93x video:6869kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown python src/preprocess_optical_flow.py --vid-path data/test/video-depth-720 --gpu 0 computing flow: 0% 0/40 [00:01 preprocess(args=args) File "src/preprocess_optical_flow.py", line 29, in preprocess flow12 = raft_wrapper.compute_flow(im1, im2) File "/content/All-In-One-Deflicker/src/models/stage_1/raft_wrapper.py", line 70, in compute_flow _, flow12 = self.model(im1, im2, iters=20, test_mode=True) File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/content/All-In-One-Deflicker/src/models/stage_1/core/raft.py", line 132, in forward net, up_mask, delta_flow = self.update_block(net, inp, corr, flow) File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/content/All-In-One-Deflicker/src/models/stage_1/core/update.py", line 135, in forward mask = .25 * self.mask(net) RuntimeError: CUDA out of memory. Tried to allocate 30.00 MiB (GPU 0; 14.75 GiB total capacity; 13.69 GiB already allocated; 6.81 MiB free; 13.98 GiB reserved in total by PyTorch) Traceback (most recent call last): File "src/stage1_neural_atlas.py", line 279, in main(json.load(f), args) File "src/stage1_neural_atlas.py", line 109, in main resy, resx, maximum_number_of_frames, data_folder, True, True, vid_root, vid_name) File "/content/All-In-One-Deflicker/src/models/stage_1/unwrap_utils.py", line 145, in load_input_data_single flow12 = np.load(flow12_fn) File "/usr/local/lib/python3.7/site-packages/numpy/lib/npyio.py", line 417, in load fid = stack.enter_context(open(os_fspath(file), "rb")) FileNotFoundError: [Errno 2] No such file or directory: 'data/test/video-depth-720_flow/00000.png_00001.png.npy' Namespace(ckpt_filter='./pretrained_weights/neural_filter.pth', ckpt_local='./pretrained_weights/local_refinement_net.pth', gpu=0, video_name='video-depth-720') Load ./pretrained_weights/local_refinement_net.pth Traceback (most recent call last): File "src/neural_filter_and_refinement.py", line 72, in assert len(style_names) == len(content_names), "the number of style frames is different from the number of content frames" AssertionError: the number of style frames is different from the number of content frames ```

May I know it is possible to process large video file on Colab with this project? Or I'll have to run it on machine with more GPU mem? Thank you :)

SlimeVRX commented 1 year ago

Thank you for implementing Colab, I also want to try it now!

m0o0scar commented 1 year ago

Hi @SlimeVRX , you're welcome, the notebook is still pretty rough now with no documentations or comments what so ever. I'll see if I can make it easier to use :)

ChenyangLEI commented 1 year ago

Hi,

The preprocess optical flow cannot support high-resolutin images : (

We will try to find a more efficient preprocessing optical flow estimation model.

Another way is to use the CPU instead of GPU to compute the flow but it would be quite slow.

FurkanGozukara commented 1 year ago

Hi,

The preprocess optical flow cannot support high-resolutin images : (

We will try to find a more efficient preprocessing optical flow estimation model.

Another way is to use the CPU instead of GPU to compute the flow but it would be quite slow.

hi. so we cant use 1024x1024 on this?

what is the maximum supported resolution?

I am working on a video to animation tutorial and i plan to use your script to improve flickering

how much vram would be necessary for 1024x1024?

enn-nafnlaus commented 1 year ago

Same problem here. I didn't expect that 24GB of VRAM would be a "not enough memory" situation here to process a 1920x1080 (a common video resolution) file. :(

ED: Looks like I can run at it at 75% of its size (1440x810). Will see how it looks after deflicker + AI upscale.

As for how to "find a more efficient preprocessing optical flow estimation model", have you tried a smaller bit quantization? For most graphics apps, in my experience, there's basically no difference between 16 and 32 bit quantization, and acceptable quality loss with 8 bit (for LLMs you can go a lot lower still, but I'd be surprised if you could get away with less than 8 bit here)

ED2: Oh hey, looks like I can just barely pull off 1920x1080 if I unload StableDiffusion and free up that last couple gigs of VRAM. Yeay! TL/DR: if you want 1920x1080, you really need a full 24GB.

ChenyangLEI / All-In-One-Deflicker

"CUDA out of memory" for large video file #5