Approximate release date of MULTI-GPU inferencing

SamitM1 commented 2 months ago

Hey guys great work with this. We were wondering if and (approximately) when you will be releasing the multi gpu inferencing. Furthermore what is the time taken with default settings to inference a 6 second long CogVideoX generated video if using a H100 (which more powerful and efficient than A100). Furthermore if and once multi gpu inferencing is implemented approximately how long would it take to inference a CogVideoX generated video with 8 A100 or with 8 H100 gpus?

thanks in advance

hejingwenhejingwen commented 2 months ago

Thanks for your questions! The multi gpu inference will be supported next week. And the corresponding inference time on A100 (1~8) will be recorded.

SamitM1 commented 2 months ago

Great! So it will be released today or tomorrow?

I see you guys have added an open source plan, which is awesome by the way. But there is no official release date for the multi gpu inference.

Thanks again @hejingwenhejingwen

hejingwenhejingwen commented 1 month ago

Great! So it will be released today or tomorrow?

I see you guys have added an open source plan, which is awesome by the way. But there is no official release date for the multi gpu inference.

Thanks again @hejingwenhejingwen

It is released now.

SamitM1 commented 1 month ago

Awesome @hejingwenhejingwen

Will you guys be releasing the corresponding inference time on A100 (1~8). Right now it still seems kind of slow. We tried with 8 A100(80gb) and takes 56 to 62gb memory per gpu and roughly 45mins to enhance a 6 second long CogVideoX video.

I am running the multi_gpu_inference.sh is there anything else we need to do?

ChrisLiu6 commented 1 month ago

My apologies🙃 There was a bug in the previous version where the non-parallel model implementation is always used even during multi-gpu inference. We have now fixed the bug, could you please pull the latest version and try again with the run_VEnhancer_MultiGPU.sh script?

SamitM1 commented 1 month ago

@ChrisLiu6 I tried with 8 and 4 A100(40gb) cpus and now I get this error:

2024-09-12 03:54:14,648 - video_to_video.modules.unet_v2v_parallel - INFO - complete input shape: torch.Size([1, 4, 25, 170, 248])
2024-09-12 03:54:14,648 - video_to_video.modules.unet_v2v_parallel - INFO - complete input shape: torch.Size([1, 4, 25, 170, 248])
2024-09-12 03:54:14,648 - video_to_video.modules.unet_v2v_parallel - INFO - sharded input shape: torch.Size([1, 4, 4, 170, 248])
2024-09-12 03:54:14,648 - video_to_video.modules.unet_v2v_parallel - INFO - complete input shape: torch.Size([1, 4, 25, 170, 248])
2024-09-12 03:54:14,648 - video_to_video.modules.unet_v2v_parallel - INFO - sharded input shape: torch.Size([1, 4, 4, 170, 248])
2024-09-12 03:54:14,648 - video_to_video.modules.unet_v2v_parallel - INFO - sharded input shape: torch.Size([1, 4, 4, 170, 248])
2024-09-12 03:54:14,648 - video_to_video.modules.unet_v2v_parallel - INFO - complete input shape: torch.Size([1, 4, 25, 170, 248])
2024-09-12 03:54:14,648 - video_to_video.modules.unet_v2v_parallel - INFO - sharded input shape: torch.Size([1, 4, 4, 170, 248])
2024-09-12 03:54:14,648 - video_to_video.modules.unet_v2v_parallel - INFO - hint shape: torch.Size([1, 4, 49, 170, 248])
2024-09-12 03:54:14,648 - video_to_video.modules.unet_v2v_parallel - INFO - hint shape: torch.Size([1, 4, 49, 170, 248])
2024-09-12 03:54:14,648 - video_to_video.modules.unet_v2v_parallel - INFO - hint shape: torch.Size([1, 4, 49, 170, 248])
2024-09-12 03:54:14,648 - video_to_video.modules.unet_v2v_parallel - INFO - t_hint shape: torch.Size([1])
2024-09-12 03:54:14,648 - video_to_video.modules.unet_v2v_parallel - INFO - sharded input shape: torch.Size([1, 4, 4, 170, 248])
2024-09-12 03:54:14,648 - video_to_video.modules.unet_v2v_parallel - INFO - sharded input shape: torch.Size([1, 4, 1, 170, 248])
2024-09-12 03:54:14,648 - video_to_video.modules.unet_v2v_parallel - INFO - hint shape: torch.Size([1, 4, 49, 170, 248])
2024-09-12 03:54:14,648 - video_to_video.modules.unet_v2v_parallel - INFO - t_hint shape: torch.Size([1])
2024-09-12 03:54:14,648 - video_to_video.modules.unet_v2v_parallel - INFO - t_hint shape: torch.Size([1])
2024-09-12 03:54:14,648 - video_to_video.modules.unet_v2v_parallel - INFO - sharded input shape: torch.Size([1, 4, 4, 170, 248])
2024-09-12 03:54:14,648 - video_to_video.modules.unet_v2v_parallel - INFO - mask_cond shape: torch.Size([1, 25])
2024-09-12 03:54:14,648 - video_to_video.modules.unet_v2v_parallel - INFO - mask_cond shape: torch.Size([1, 25])
2024-09-12 03:54:14,648 - video_to_video.modules.unet_v2v_parallel - INFO - t_hint shape: torch.Size([1])
2024-09-12 03:54:14,648 - video_to_video.modules.unet_v2v_parallel - INFO - hint shape: torch.Size([1, 4, 49, 170, 248])
2024-09-12 03:54:14,648 - video_to_video.modules.unet_v2v_parallel - INFO - mask_cond shape: torch.Size([1, 25])
2024-09-12 03:54:14,648 - video_to_video.modules.unet_v2v_parallel - INFO - hint shape: torch.Size([1, 4, 49, 170, 248])
2024-09-12 03:54:14,648 - video_to_video.modules.unet_v2v_parallel - INFO - s_cond shape: torch.Size([1])
2024-09-12 03:54:14,648 - video_to_video.modules.unet_v2v_parallel - INFO - s_cond shape: torch.Size([1])
2024-09-12 03:54:14,648 - video_to_video.modules.unet_v2v_parallel - INFO - mask_cond shape: torch.Size([1, 25])
2024-09-12 03:54:14,648 - video_to_video.modules.unet_v2v_parallel - INFO - s_cond shape: torch.Size([1])
2024-09-12 03:54:14,648 - video_to_video.modules.unet_v2v_parallel - INFO - t_hint shape: torch.Size([1])
2024-09-12 03:54:14,648 - video_to_video.modules.unet_v2v_parallel - INFO - t_hint shape: torch.Size([1])
2024-09-12 03:54:14,648 - video_to_video.modules.unet_v2v_parallel - INFO - hint shape: torch.Size([1, 4, 49, 170, 248])
2024-09-12 03:54:14,648 - video_to_video.modules.unet_v2v_parallel - INFO - s_cond shape: torch.Size([1])
2024-09-12 03:54:14,648 - video_to_video.modules.unet_v2v_parallel - INFO - mask_cond shape: torch.Size([1, 25])
2024-09-12 03:54:14,648 - video_to_video.modules.unet_v2v_parallel - INFO - mask_cond shape: torch.Size([1, 25])
2024-09-12 03:54:14,648 - video_to_video.modules.unet_v2v_parallel - INFO - t_hint shape: torch.Size([1])
2024-09-12 03:54:14,648 - video_to_video.modules.unet_v2v_parallel - INFO - s_cond shape: torch.Size([1])
2024-09-12 03:54:14,648 - video_to_video.modules.unet_v2v_parallel - INFO - s_cond shape: torch.Size([1])
2024-09-12 03:54:14,648 - video_to_video.modules.unet_v2v_parallel - INFO - mask_cond shape: torch.Size([1, 25])
2024-09-12 03:54:14,648 - video_to_video.modules.unet_v2v_parallel - INFO - s_cond shape: torch.Size([1])
[rank7]: Traceback (most recent call last):
[rank7]:   File "/home/ubuntu/VEnhancer/enhance_a_video_MultiGPU.py", line 202, in <module>
[rank7]:     main()
[rank7]:   File "/home/ubuntu/VEnhancer/enhance_a_video_MultiGPU.py", line 198, in main
[rank7]:     venhancer.enhance_a_video(file_path, prompt, up_scale, target_fps, noise_aug)
[rank7]:   File "/home/ubuntu/VEnhancer/enhance_a_video_MultiGPU.py", line 80, in enhance_a_video
[rank7]:     output = self.model.test(
[rank7]:   File "/home/ubuntu/VEnhancer/video_to_video/video_to_video_model_parallel.py", line 129, in test
[rank7]:     gen_vid = self.diffusion.sample(
[rank7]:   File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank7]:     return func(*args, **kwargs)
[rank7]:   File "/home/ubuntu/VEnhancer/video_to_video/diffusion/diffusion_sdedit.py", line 256, in sample
[rank7]:     x0 = solver_fn(noise, fn, sigmas, show_progress=show_progress, **kwargs)
[rank7]:   File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank7]:     return func(*args, **kwargs)
[rank7]:   File "/home/ubuntu/VEnhancer/video_to_video/diffusion/solvers_sdedit.py", line 145, in sample_dpmpp_2m_sde
[rank7]:     denoised = model(x * c_in, sigmas[i])
[rank7]:   File "/home/ubuntu/VEnhancer/video_to_video/diffusion/diffusion_sdedit.py", line 195, in model_chunk_fn
[rank7]:     x0_chunk = self.denoise(
[rank7]:   File "/home/ubuntu/VEnhancer/video_to_video/diffusion/diffusion_sdedit.py", line 59, in denoise
[rank7]:     y_out = model(
[rank7]:   File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank7]:     return self._call_impl(*args, **kwargs)
[rank7]:   File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank7]:     return forward_call(*args, **kwargs)
[rank7]:   File "/home/ubuntu/VEnhancer/video_to_video/modules/unet_v2v_parallel.py", line 1114, in forward
[rank7]:     x = x[get_context_parallel_rank()]
[rank7]: IndexError: tuple index out of range

I did this: pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pip install -r requirements.txt

and then install xformers: pip3 install -U xformers --index-url https://download.pytorch.org/whl/cu121

for 4 gpus it just prints this out many times and doesn't seem to be stopping:

2024-09-12 03:59:03,103 - video_to_video.modules.parallel_modules - INFO - paraconv out shape: torch.Size([1, 640, 4, 86, 124])
2024-09-12 03:59:03,103 - video_to_video.modules.parallel_modules - INFO - paraconv out shape: torch.Size([1, 640, 6, 86, 124])
2024-09-12 03:59:03,103 - video_to_video.modules.parallel_modules - INFO - paraconv out shape: torch.Size([1, 640, 6, 86, 124])
2024-09-12 03:59:03,108 - video_to_video.modules.parallel_modules - INFO - paraconv out shape: torch.Size([1, 640, 6, 86, 124])
2024-09-12 03:59:03,108 - video_to_video.modules.parallel_modules - INFO - paraconv out shape: torch.Size([1, 640, 4, 86, 124])
2024-09-12 03:59:03,109 - video_to_video.modules.parallel_modules - INFO - paraconv out shape: torch.Size([1, 640, 6, 86, 124])
2024-09-12 03:59:03,109 - video_to_video.modules.parallel_modules - INFO - paraconv out shape: torch.Size([1, 640, 6, 86, 124])
2024-09-12 03:59:03,114 - video_to_video.modules.parallel_modules - INFO - paraconv out shape: torch.Size([1, 640, 6, 86, 124])
2024-09-12 03:59:03,114 - video_to_video.modules.parallel_modules - INFO - paraconv out shape: torch.Size([1, 640, 4, 86, 124])
2024-09-12 03:59:03,114 - video_to_video.modules.parallel_modules - INFO - paraconv out shape: torch.Size([1, 640, 6, 86, 124])
2024-09-12 03:59:03,114 - video_to_video.modules.parallel_modules - INFO - paraconv out shape: torch.Size([1, 640, 6, 86, 124])
2024-09-12 03:59:03,141 - video_to_video.modules.parallel_modules - INFO - all_to_all_input: torch.Size([1, 640, 6, 10664])
2024-09-12 03:59:03,141 - video_to_video.modules.parallel_modules - INFO - all_to_all_input: torch.Size([1, 640, 6, 10664])
2024-09-12 03:59:03,141 - video_to_video.modules.parallel_modules - INFO - all_to_all_input: torch.Size([1, 640, 4, 10664])
2024-09-12 03:59:03,141 - video_to_video.modules.parallel_modules - INFO - all_to_all_input: torch.Size([1, 640, 6, 10664])
2024-09-12 03:59:03,152 - video_to_video.modules.parallel_modules - INFO - all_to_all_output: torch.Size([1, 640, 22, 2666])
2024-09-12 03:59:03,152 - video_to_video.modules.parallel_modules - INFO - all_to_all_output: torch.Size([1, 640, 22, 2666])
2024-09-12 03:59:03,152 - video_to_video.modules.parallel_modules - INFO - all_to_all_output: torch.Size([1, 640, 22, 2666])
2024-09-12 03:59:03,152 - video_to_video.modules.parallel_modules - INFO - all_to_all_output: torch.Size([1, 640, 22, 2666])
2024-09-12 03:59:03,154 - video_to_video.modules.parallel_modules - INFO - all_to_all_input: torch.Size([1, 640, 22, 2666])
2024-09-12 03:59:03,154 - video_to_video.modules.parallel_modules - INFO - all_to_all_input: torch.Size([1, 640, 22, 2666])
2024-09-12 03:59:03,154 - video_to_video.modules.parallel_modules - INFO - all_to_all_input: torch.Size([1, 640, 22, 2666])
2024-09-12 03:59:03,155 - video_to_video.modules.parallel_modules - INFO - all_to_all_input: torch.Size([1, 640, 22, 2666])
2024-09-12 03:59:03,173 - video_to_video.modules.parallel_modules - INFO - all_to_all_output: torch.Size([1, 640, 6, 10664])
2024-09-12 03:59:03,173 - video_to_video.modules.parallel_modules - INFO - all_to_all_output: torch.Size([1, 640, 4, 10664])
2024-09-12 03:59:03,173 - video_to_video.modules.parallel_modules - INFO - all_to_all_output: torch.Size([1, 640, 6, 10664])
2024-09-12 03:59:03,173 - video_to_video.modules.parallel_modules - INFO - all_to_all_output: torch.Size([1, 640, 6, 10664])
2024-09-12 03:59:03,209 - video_to_video.modules.parallel_modules - INFO - paraconv out shape: torch.Size([1, 320, 6, 170, 248])
2024-09-12 03:59:03,209 - video_to_video.modules.parallel_modules - INFO - paraconv out shape: torch.Size([1, 320, 4, 170, 248])
2024-09-12 03:59:03,209 - video_to_video.modules.parallel_modules - INFO - paraconv out shape: torch.Size([1, 320, 6, 170, 248])
2024-09-12 03:59:03,209 - video_to_video.modules.parallel_modules - INFO - paraconv out shape: torch.Size([1, 320, 6, 170, 248])
2024-09-12 03:59:03,218 - video_to_video.modules.parallel_modules - INFO - paraconv out shape: torch.Size([1, 320, 6, 170, 248])
2024-09-12 03:59:03,218 - video_to_video.modules.parallel_modules - INFO - paraconv out shape: torch.Size([1, 320, 4, 170, 248])
2024-09-12 03:59:03,219 - video_to_video.modules.parallel_modules - INFO - paraconv out shape: torch.Size([1, 320, 6, 170, 248])
2024-09-12 03:59:03,219 - video_to_video.modules.parallel_modules - INFO - paraconv out shape: torch.Size([1, 320, 6, 170, 248])
2024-09-12 03:59:03,228 - video_to_video.modules.parallel_modules - INFO - paraconv out shape: torch.Size([1, 320, 6, 170, 248])
2024-09-12 03:59:03,228 - video_to_video.modules.parallel_modules - INFO - paraconv out shape: torch.Size([1, 320, 4, 170, 248])
2024-09-12 03:59:03,228 - video_to_video.modules.parallel_modules - INFO - paraconv out shape: torch.Size([1, 320, 6, 170, 248])
2024-09-12 03:59:03,228 - video_to_video.modules.parallel_modules - INFO - paraconv out shape: torch.Size([1, 320, 6, 170, 248])
2024-09-12 03:59:03,237 - video_to_video.modules.parallel_modules - INFO - paraconv out shape: torch.Size([1, 320, 6, 170, 248])
2024-09-12 03:59:03,237 - video_to_video.modules.parallel_modules - INFO - paraconv out shape: torch.Size([1, 320, 4, 170, 248])
2024-09-12 03:59:03,237 - video_to_video.modules.parallel_modules - INFO - paraconv out shape: torch.Size([1, 320, 6, 170, 248])
2024-09-12 03:59:03,237 - video_to_video.modules.parallel_modules - INFO - paraconv out shape: torch.Size([1, 320, 6, 170, 248])
2024-09-12 03:59:03,343 - video_to_video.modules.parallel_modules - INFO - all_to_all_input: torch.Size([1, 320, 6, 42160])
2024-09-12 03:59:03,343 - video_to_video.modules.parallel_modules - INFO - all_to_all_input: torch.Size([1, 320, 6, 42160])
2024-09-12 03:59:03,343 - video_to_video.modules.parallel_modules - INFO - all_to_all_input: torch.Size([1, 320, 4, 42160])
2024-09-12 03:59:03,343 - video_to_video.modules.parallel_modules - INFO - all_to_all_input: torch.Size([1, 320, 6, 42160])
2024-09-12 03:59:03,364 - video_to_video.modules.parallel_modules - INFO - all_to_all_output: torch.Size([1, 320, 22, 10540])
2024-09-12 03:59:03,364 - video_to_video.modules.parallel_modules - INFO - all_to_all_output: torch.Size([1, 320, 22, 10540])
2024-09-12 03:59:03,364 - video_to_video.modules.parallel_modules - INFO - all_to_all_output: torch.Size([1, 320, 22, 10540])
2024-09-12 03:59:03,364 - video_to_video.modules.parallel_modules - INFO - all_to_all_output: torch.Size([1, 320, 22, 10540])
2024-09-12 03:59:03,367 - video_to_video.modules.parallel_modules - INFO - all_to_all_input: torch.Size([1, 320, 22, 10540])
2024-09-12 03:59:03,367 - video_to_video.modules.parallel_modules - INFO - all_to_all_input: torch.Size([1, 320, 22, 10540])
2024-09-12 03:59:03,367 - video_to_video.modules.parallel_modules - INFO - all_to_all_input: torch.Size([1, 320, 22, 10540])
2024-09-12 03:59:03,368 - video_to_video.modules.parallel_modules - INFO - all_to_all_input: torch.Size([1, 320, 22, 10540])
2024-09-12 03:59:03,404 - video_to_video.modules.parallel_modules - INFO - all_to_all_output: torch.Size([1, 320, 6, 42160])
2024-09-12 03:59:03,404 - video_to_video.modules.parallel_modules - INFO - all_to_all_output: torch.Size([1, 320, 4, 42160])
2024-09-12 03:59:03,404 - video_to_video.modules.parallel_modules - INFO - all_to_all_output: torch.Size([1, 320, 6, 42160])

ChrisLiu6 commented 1 month ago

for 4 gpus it just prints this out many times and doesn't seem to be stopping:

If it continuously prints the stuff, it is working as expected.

For the 8 gpu case the error is likely due to too few frames to be parallel-processed by too many gpus, I'm looking into it.

SamitM1 commented 1 month ago

@ChrisLiu6 thank you for your response.

Have you fixed the 8 gpu inferencing in the latest pushes or are you still working on it?

ChrisLiu6 commented 1 month ago

@SamitM1 Well, the problem with 8 GPUs is as follows: your input to the diffusion model has totally 25 frames. When 8 GPUs are used, each GPU becomes responsible for 4 frames. However, when doing so, the number of frames allocated to each frame would be [4, 4, 4, 4, 4, 4, 1, 0], which means the last GPU is allocated with no frame at all, and the second to last GPU also has 3 empty frames. This case was not covered when I wrote the parallel inference codes as I only assumed only the last GPU could have empty frames.

This means that when using 8 gpus to process 25 frames, one gpu is destined to be idle. Therefore, as a walkaround, you can use 7 GPUs. We will soon add the logics that for cases like yours we will automatically use less GPUs.

By the way, does the 4-GPU run work fine? Thx

SamitM1 commented 1 month ago

@ChrisLiu6 I am actually inputing 49 frames(6 second long video at 8 FPS + 1 starting frame).

The thing is we want to use more/8 GPUs (assuming it would lead to faster inferencing) so based on what you said above does that mean we have to pass in a video with a total number of frames that is divisible by 8 like for example 48 frames(which would mean 6 frames per GPU).

Basically is there anyway for us run the following input with 8 GPUS:

2024-09-13 04:42:38,905 - video_to_video - INFO - input frames length: 49
2024-09-13 04:42:38,905 - video_to_video - INFO - input fps: 8.0
2024-09-13 04:42:38,905 - video_to_video - INFO - target_fps: 24.0
2024-09-13 04:42:39,204 - video_to_video - INFO - input resolution: (480, 720)
2024-09-13 04:42:39,205 - video_to_video - INFO - target resolution: (1320, 1982)
2024-09-13 04:42:39,205 - video_to_video - INFO - noise augmentation: 250
2024-09-13 04:42:39,205 - video_to_video - INFO - scale s is set to: 8
2024-09-13 04:42:39,251 - video_to_video - INFO - video_data shape: torch.Size([145, 3, 1320, 1982])
2024-09-13 04:42:39,294 - video_to_video - INFO - input resolution: (480, 720)
2024-09-13 04:42:39,294 - video_to_video - INFO - target resolution: (1320, 1982)
2024-09-13 04:42:39,294 - video_to_video - INFO - noise augmentation: 250
2024-09-13 04:42:39,294 - video_to_video - INFO - scale s is set to: 8
2024-09-13 04:42:39,335 - video_to_video - INFO - video_data shape: torch.Size([145, 3, 1320, 1982])
2024-09-13 04:42:39,464 - video_to_video - INFO - Load model path ./ckpts/venhancer_paper.pt, with local status <All keys matched successfully>
2024-09-13 04:42:39,466 - video_to_video - INFO - Build diffusion with GaussianDiffusion

Thanks in advance!

edit:

7 gpus does not work:

[rank6]: Traceback (most recent call last):
[rank6]:   File "/home/ubuntu/VEnhancer/enhance_a_video_MultiGPU.py", line 215, in <module>
[rank6]:     main()
[rank6]:   File "/home/ubuntu/VEnhancer/enhance_a_video_MultiGPU.py", line 211, in main
[rank6]:     venhancer.enhance_a_video(file_path, prompt, up_scale, target_fps, noise_aug)
[rank6]:   File "/home/ubuntu/VEnhancer/enhance_a_video_MultiGPU.py", line 89, in enhance_a_video
[rank6]:     output = self.model.test(
[rank6]:   File "/home/ubuntu/VEnhancer/video_to_video/video_to_video_model_parallel.py", line 129, in test
[rank6]:     gen_vid = self.diffusion.sample(
[rank6]:   File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank6]:     return func(*args, **kwargs)
[rank6]:   File "/home/ubuntu/VEnhancer/video_to_video/diffusion/diffusion_sdedit.py", line 256, in sample
[rank6]:     x0 = solver_fn(noise, fn, sigmas, show_progress=show_progress, **kwargs)
[rank6]:   File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank6]:     return func(*args, **kwargs)
[rank6]:   File "/home/ubuntu/VEnhancer/video_to_video/diffusion/solvers_sdedit.py", line 145, in sample_dpmpp_2m_sde
[rank6]:     denoised = model(x * c_in, sigmas[i])
[rank6]:   File "/home/ubuntu/VEnhancer/video_to_video/diffusion/diffusion_sdedit.py", line 195, in model_chunk_fn
[rank6]:     x0_chunk = self.denoise(
[rank6]:   File "/home/ubuntu/VEnhancer/video_to_video/diffusion/diffusion_sdedit.py", line 59, in denoise
[rank6]:     y_out = model(
[rank6]:   File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank6]:     return self._call_impl(*args, **kwargs)
[rank6]:   File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank6]:     return forward_call(*args, **kwargs)
[rank6]:   File "/home/ubuntu/VEnhancer/video_to_video/modules/unet_v2v_parallel.py", line 1114, in forward
[rank6]:     x = x[get_context_parallel_rank()]
[rank6]: IndexError: tuple index out of range
/home/ubuntu/VEnhancer/video_to_video/video_to_video_model_parallel.py:110: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with amp.autocast(enabled=True):
/home/ubuntu/VEnhancer/video_to_video/video_to_video_model_parallel.py:110: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with amp.autocast(enabled=True):

SamitM1 commented 1 month ago

@ChrisLiu6 whats the max number of GPUs I can use for a video that has a total of 49 frames and is there any way I can increase the number of GPUs I can use without it erroring?

ChrisLiu6 commented 1 month ago

@ChrisLiu6 whats the max number of GPUs I can use for a video that has a total of 49 frames and is there any way I can increase the number of GPUs I can use without it erroring?

Hi, I've just pushed a commit, and now you should be able to use any number of GPUs without error (however, in some cases some of the GPUs may be idle). Generally speaking, using 4 or 8 GPUs can be a good choice in most cases.

SamitM1 commented 1 month ago

@ChrisLiu6 I ran with 8 A100(40gb) GPUs and it gave me a memory error AFTER it basically finished:

2024-09-16 23:12:22,184 - video_to_video.modules.unet_v2v_parallel - INFO - complete f: 40
2024-09-16 23:12:24,191 - video_to_video - INFO - step: 13
2024-09-16 23:12:24,191 - video_to_video - INFO - step: 13
2024-09-16 23:12:24,192 - video_to_video - INFO - step: 13
2024-09-16 23:12:24,192 - video_to_video - INFO - step: 13
2024-09-16 23:12:24,192 - video_to_video - INFO - step: 13
2024-09-16 23:12:24,193 - video_to_video - INFO - step: 13
2024-09-16 23:12:24,193 - video_to_video - INFO - step: 13
2024-09-16 23:12:24,194 - video_to_video - INFO - step: 13
2024-09-16 23:12:24,217 - video_to_video.modules.unet_v2v_parallel - INFO - complete input shape: torch.Size([1, 4, 31, 162, 240])
2024-09-16 23:12:24,217 - video_to_video.modules.unet_v2v_parallel - INFO - sharded input shape: torch.Size([1, 4, 4, 162, 240])
2024-09-16 23:12:24,217 - video_to_video.modules.unet_v2v_parallel - INFO - hint shape: torch.Size([1, 4, 49, 162, 240])
2024-09-16 23:12:24,217 - video_to_video.modules.unet_v2v_parallel - INFO - t_hint shape: torch.Size([1])
2024-09-16 23:12:24,217 - video_to_video.modules.unet_v2v_parallel - INFO - mask_cond shape: torch.Size([1, 31])
2024-09-16 23:12:24,217 - video_to_video.modules.unet_v2v_parallel - INFO - s_cond shape: torch.Size([1])
2024-09-16 23:12:24,220 - video_to_video.modules.unet_v2v_parallel - INFO - complete f: 31
2024-09-16 23:12:26,110 - video_to_video.modules.unet_v2v_parallel - INFO - complete input shape: torch.Size([1, 4, 31, 162, 240])
2024-09-16 23:12:26,110 - video_to_video.modules.unet_v2v_parallel - INFO - sharded input shape: torch.Size([1, 4, 4, 162, 240])
2024-09-16 23:12:26,110 - video_to_video.modules.unet_v2v_parallel - INFO - hint shape: torch.Size([1, 4, 49, 162, 240])
2024-09-16 23:12:26,110 - video_to_video.modules.unet_v2v_parallel - INFO - t_hint shape: torch.Size([1])
2024-09-16 23:12:26,110 - video_to_video.modules.unet_v2v_parallel - INFO - mask_cond shape: torch.Size([1, 31])
2024-09-16 23:12:26,110 - video_to_video.modules.unet_v2v_parallel - INFO - s_cond shape: torch.Size([1])
2024-09-16 23:12:26,111 - video_to_video.modules.unet_v2v_parallel - INFO - complete f: 31
2024-09-16 23:12:27,886 - video_to_video.modules.unet_v2v_parallel - INFO - complete input shape: torch.Size([1, 4, 31, 162, 240])
2024-09-16 23:12:27,886 - video_to_video.modules.unet_v2v_parallel - INFO - sharded input shape: torch.Size([1, 4, 4, 162, 240])
2024-09-16 23:12:27,887 - video_to_video.modules.unet_v2v_parallel - INFO - hint shape: torch.Size([1, 4, 49, 162, 240])
2024-09-16 23:12:27,887 - video_to_video.modules.unet_v2v_parallel - INFO - t_hint shape: torch.Size([1])
2024-09-16 23:12:27,887 - video_to_video.modules.unet_v2v_parallel - INFO - mask_cond shape: torch.Size([1, 31])
2024-09-16 23:12:27,887 - video_to_video.modules.unet_v2v_parallel - INFO - s_cond shape: torch.Size([1])
2024-09-16 23:12:27,887 - video_to_video.modules.unet_v2v_parallel - INFO - complete f: 31
2024-09-16 23:12:29,663 - video_to_video.modules.unet_v2v_parallel - INFO - complete input shape: torch.Size([1, 4, 31, 162, 240])
2024-09-16 23:12:29,663 - video_to_video.modules.unet_v2v_parallel - INFO - sharded input shape: torch.Size([1, 4, 4, 162, 240])
2024-09-16 23:12:29,663 - video_to_video.modules.unet_v2v_parallel - INFO - hint shape: torch.Size([1, 4, 49, 162, 240])
2024-09-16 23:12:29,663 - video_to_video.modules.unet_v2v_parallel - INFO - t_hint shape: torch.Size([1])
2024-09-16 23:12:29,663 - video_to_video.modules.unet_v2v_parallel - INFO - mask_cond shape: torch.Size([1, 31])
2024-09-16 23:12:29,663 - video_to_video.modules.unet_v2v_parallel - INFO - s_cond shape: torch.Size([1])
2024-09-16 23:12:29,664 - video_to_video.modules.unet_v2v_parallel - INFO - complete f: 31
2024-09-16 23:12:31,440 - video_to_video.modules.unet_v2v_parallel - INFO - complete input shape: torch.Size([1, 4, 31, 162, 240])
2024-09-16 23:12:31,440 - video_to_video.modules.unet_v2v_parallel - INFO - sharded input shape: torch.Size([1, 4, 4, 162, 240])
2024-09-16 23:12:31,440 - video_to_video.modules.unet_v2v_parallel - INFO - hint shape: torch.Size([1, 4, 49, 162, 240])
2024-09-16 23:12:31,440 - video_to_video.modules.unet_v2v_parallel - INFO - t_hint shape: torch.Size([1])
2024-09-16 23:12:31,440 - video_to_video.modules.unet_v2v_parallel - INFO - mask_cond shape: torch.Size([1, 31])
2024-09-16 23:12:31,440 - video_to_video.modules.unet_v2v_parallel - INFO - s_cond shape: torch.Size([1])
2024-09-16 23:12:31,440 - video_to_video.modules.unet_v2v_parallel - INFO - complete f: 31
2024-09-16 23:12:33,217 - video_to_video.modules.unet_v2v_parallel - INFO - complete input shape: torch.Size([1, 4, 31, 162, 240])
2024-09-16 23:12:33,218 - video_to_video.modules.unet_v2v_parallel - INFO - sharded input shape: torch.Size([1, 4, 4, 162, 240])
2024-09-16 23:12:33,218 - video_to_video.modules.unet_v2v_parallel - INFO - hint shape: torch.Size([1, 4, 49, 162, 240])
2024-09-16 23:12:33,218 - video_to_video.modules.unet_v2v_parallel - INFO - t_hint shape: torch.Size([1])
2024-09-16 23:12:33,218 - video_to_video.modules.unet_v2v_parallel - INFO - mask_cond shape: torch.Size([1, 31])
2024-09-16 23:12:33,218 - video_to_video.modules.unet_v2v_parallel - INFO - s_cond shape: torch.Size([1])
2024-09-16 23:12:33,218 - video_to_video.modules.unet_v2v_parallel - INFO - complete f: 31
2024-09-16 23:12:34,995 - video_to_video.modules.unet_v2v_parallel - INFO - complete input shape: torch.Size([1, 4, 31, 162, 240])
2024-09-16 23:12:34,996 - video_to_video.modules.unet_v2v_parallel - INFO - sharded input shape: torch.Size([1, 4, 4, 162, 240])
2024-09-16 23:12:34,996 - video_to_video.modules.unet_v2v_parallel - INFO - hint shape: torch.Size([1, 4, 49, 162, 240])
2024-09-16 23:12:34,996 - video_to_video.modules.unet_v2v_parallel - INFO - t_hint shape: torch.Size([1])
2024-09-16 23:12:34,996 - video_to_video.modules.unet_v2v_parallel - INFO - mask_cond shape: torch.Size([1, 31])
2024-09-16 23:12:34,996 - video_to_video.modules.unet_v2v_parallel - INFO - s_cond shape: torch.Size([1])
2024-09-16 23:12:34,996 - video_to_video.modules.unet_v2v_parallel - INFO - complete f: 31
2024-09-16 23:12:36,786 - video_to_video.modules.unet_v2v_parallel - INFO - complete input shape: torch.Size([1, 4, 31, 162, 240])
2024-09-16 23:12:36,786 - video_to_video.modules.unet_v2v_parallel - INFO - sharded input shape: torch.Size([1, 4, 4, 162, 240])
2024-09-16 23:12:36,787 - video_to_video.modules.unet_v2v_parallel - INFO - hint shape: torch.Size([1, 4, 49, 162, 240])
2024-09-16 23:12:36,787 - video_to_video.modules.unet_v2v_parallel - INFO - t_hint shape: torch.Size([1])
2024-09-16 23:12:36,787 - video_to_video.modules.unet_v2v_parallel - INFO - mask_cond shape: torch.Size([1, 31])
2024-09-16 23:12:36,787 - video_to_video.modules.unet_v2v_parallel - INFO - s_cond shape: torch.Size([1])
2024-09-16 23:12:36,787 - video_to_video.modules.unet_v2v_parallel - INFO - complete f: 31
2024-09-16 23:12:38,571 - video_to_video.modules.unet_v2v_parallel - INFO - complete input shape: torch.Size([1, 4, 31, 162, 240])
2024-09-16 23:12:38,572 - video_to_video.modules.unet_v2v_parallel - INFO - sharded input shape: torch.Size([1, 4, 4, 162, 240])
2024-09-16 23:12:38,572 - video_to_video.modules.unet_v2v_parallel - INFO - hint shape: torch.Size([1, 4, 49, 162, 240])
2024-09-16 23:12:38,572 - video_to_video.modules.unet_v2v_parallel - INFO - t_hint shape: torch.Size([1])
2024-09-16 23:12:38,572 - video_to_video.modules.unet_v2v_parallel - INFO - mask_cond shape: torch.Size([1, 31])
2024-09-16 23:12:38,572 - video_to_video.modules.unet_v2v_parallel - INFO - s_cond shape: torch.Size([1])
2024-09-16 23:12:38,572 - video_to_video.modules.unet_v2v_parallel - INFO - complete f: 31
2024-09-16 23:12:40,346 - video_to_video.modules.unet_v2v_parallel - INFO - complete input shape: torch.Size([1, 4, 31, 162, 240])
2024-09-16 23:12:40,346 - video_to_video.modules.unet_v2v_parallel - INFO - sharded input shape: torch.Size([1, 4, 4, 162, 240])
2024-09-16 23:12:40,346 - video_to_video.modules.unet_v2v_parallel - INFO - hint shape: torch.Size([1, 4, 49, 162, 240])
2024-09-16 23:12:40,346 - video_to_video.modules.unet_v2v_parallel - INFO - t_hint shape: torch.Size([1])
2024-09-16 23:12:40,346 - video_to_video.modules.unet_v2v_parallel - INFO - mask_cond shape: torch.Size([1, 31])
2024-09-16 23:12:40,346 - video_to_video.modules.unet_v2v_parallel - INFO - s_cond shape: torch.Size([1])
2024-09-16 23:12:40,347 - video_to_video.modules.unet_v2v_parallel - INFO - complete f: 31
2024-09-16 23:12:42,124 - video_to_video.modules.unet_v2v_parallel - INFO - complete input shape: torch.Size([1, 4, 31, 162, 240])
2024-09-16 23:12:42,125 - video_to_video.modules.unet_v2v_parallel - INFO - sharded input shape: torch.Size([1, 4, 4, 162, 240])
2024-09-16 23:12:42,125 - video_to_video.modules.unet_v2v_parallel - INFO - hint shape: torch.Size([1, 4, 49, 162, 240])
2024-09-16 23:12:42,125 - video_to_video.modules.unet_v2v_parallel - INFO - t_hint shape: torch.Size([1])
2024-09-16 23:12:42,125 - video_to_video.modules.unet_v2v_parallel - INFO - mask_cond shape: torch.Size([1, 31])
2024-09-16 23:12:42,125 - video_to_video.modules.unet_v2v_parallel - INFO - s_cond shape: torch.Size([1])
2024-09-16 23:12:42,125 - video_to_video.modules.unet_v2v_parallel - INFO - complete f: 31
2024-09-16 23:12:43,904 - video_to_video.modules.unet_v2v_parallel - INFO - complete input shape: torch.Size([1, 4, 31, 162, 240])
2024-09-16 23:12:43,905 - video_to_video.modules.unet_v2v_parallel - INFO - sharded input shape: torch.Size([1, 4, 4, 162, 240])
2024-09-16 23:12:43,905 - video_to_video.modules.unet_v2v_parallel - INFO - hint shape: torch.Size([1, 4, 49, 162, 240])
2024-09-16 23:12:43,905 - video_to_video.modules.unet_v2v_parallel - INFO - t_hint shape: torch.Size([1])
2024-09-16 23:12:43,905 - video_to_video.modules.unet_v2v_parallel - INFO - mask_cond shape: torch.Size([1, 31])
2024-09-16 23:12:43,905 - video_to_video.modules.unet_v2v_parallel - INFO - s_cond shape: torch.Size([1])
2024-09-16 23:12:43,905 - video_to_video.modules.unet_v2v_parallel - INFO - complete f: 31
2024-09-16 23:12:45,683 - video_to_video.modules.unet_v2v_parallel - INFO - complete input shape: torch.Size([1, 4, 31, 162, 240])
2024-09-16 23:12:45,684 - video_to_video.modules.unet_v2v_parallel - INFO - sharded input shape: torch.Size([1, 4, 4, 162, 240])
2024-09-16 23:12:45,684 - video_to_video.modules.unet_v2v_parallel - INFO - hint shape: torch.Size([1, 4, 49, 162, 240])
2024-09-16 23:12:45,684 - video_to_video.modules.unet_v2v_parallel - INFO - t_hint shape: torch.Size([1])
2024-09-16 23:12:45,684 - video_to_video.modules.unet_v2v_parallel - INFO - mask_cond shape: torch.Size([1, 31])
2024-09-16 23:12:45,684 - video_to_video.modules.unet_v2v_parallel - INFO - s_cond shape: torch.Size([1])
2024-09-16 23:12:45,684 - video_to_video.modules.unet_v2v_parallel - INFO - complete f: 31
2024-09-16 23:12:47,460 - video_to_video.modules.unet_v2v_parallel - INFO - complete input shape: torch.Size([1, 4, 31, 162, 240])
2024-09-16 23:12:47,461 - video_to_video.modules.unet_v2v_parallel - INFO - sharded input shape: torch.Size([1, 4, 4, 162, 240])
2024-09-16 23:12:47,461 - video_to_video.modules.unet_v2v_parallel - INFO - hint shape: torch.Size([1, 4, 49, 162, 240])
2024-09-16 23:12:47,461 - video_to_video.modules.unet_v2v_parallel - INFO - t_hint shape: torch.Size([1])
2024-09-16 23:12:47,461 - video_to_video.modules.unet_v2v_parallel - INFO - mask_cond shape: torch.Size([1, 31])
2024-09-16 23:12:47,461 - video_to_video.modules.unet_v2v_parallel - INFO - s_cond shape: torch.Size([1])
2024-09-16 23:12:47,461 - video_to_video.modules.unet_v2v_parallel - INFO - complete f: 31
2024-09-16 23:12:49,243 - video_to_video.modules.unet_v2v_parallel - INFO - complete input shape: torch.Size([1, 4, 40, 162, 240])
2024-09-16 23:12:49,243 - video_to_video.modules.unet_v2v_parallel - INFO - sharded input shape: torch.Size([1, 4, 5, 162, 240])
2024-09-16 23:12:49,243 - video_to_video.modules.unet_v2v_parallel - INFO - hint shape: torch.Size([1, 4, 49, 162, 240])
2024-09-16 23:12:49,243 - video_to_video.modules.unet_v2v_parallel - INFO - t_hint shape: torch.Size([1])
2024-09-16 23:12:49,243 - video_to_video.modules.unet_v2v_parallel - INFO - mask_cond shape: torch.Size([1, 40])
2024-09-16 23:12:49,243 - video_to_video.modules.unet_v2v_parallel - INFO - s_cond shape: torch.Size([1])
2024-09-16 23:12:49,243 - video_to_video.modules.unet_v2v_parallel - INFO - complete f: 40
2024-09-16 23:12:51,144 - video_to_video.modules.unet_v2v_parallel - INFO - complete input shape: torch.Size([1, 4, 40, 162, 240])
2024-09-16 23:12:51,144 - video_to_video.modules.unet_v2v_parallel - INFO - sharded input shape: torch.Size([1, 4, 5, 162, 240])
2024-09-16 23:12:51,145 - video_to_video.modules.unet_v2v_parallel - INFO - hint shape: torch.Size([1, 4, 49, 162, 240])
2024-09-16 23:12:51,145 - video_to_video.modules.unet_v2v_parallel - INFO - t_hint shape: torch.Size([1])
2024-09-16 23:12:51,145 - video_to_video.modules.unet_v2v_parallel - INFO - mask_cond shape: torch.Size([1, 40])
2024-09-16 23:12:51,145 - video_to_video.modules.unet_v2v_parallel - INFO - s_cond shape: torch.Size([1])
2024-09-16 23:12:51,145 - video_to_video.modules.unet_v2v_parallel - INFO - complete f: 40
2024-09-16 23:12:53,130 - video_to_video - INFO - sampling, finished.
2024-09-16 23:12:53,370 - video_to_video - INFO - sampling, finished.
2024-09-16 23:12:53,370 - video_to_video - INFO - sampling, finished.
2024-09-16 23:12:53,377 - video_to_video - INFO - sampling, finished.
2024-09-16 23:12:53,377 - video_to_video - INFO - sampling, finished.
2024-09-16 23:12:53,377 - video_to_video - INFO - sampling, finished.
2024-09-16 23:12:53,377 - video_to_video - INFO - sampling, finished.
2024-09-16 23:12:53,452 - video_to_video - INFO - sampling, finished.
[rank4]: Traceback (most recent call last):
[rank4]:   File "/home/ubuntu/VEnhancer/enhance_a_video_MultiGPU.py", line 215, in <module>
[rank4]:     main()
[rank4]:   File "/home/ubuntu/VEnhancer/enhance_a_video_MultiGPU.py", line 211, in main
[rank4]:     venhancer.enhance_a_video(file_path, prompt, up_scale, target_fps, noise_aug)
[rank4]:   File "/home/ubuntu/VEnhancer/enhance_a_video_MultiGPU.py", line 89, in enhance_a_video
[rank4]:     output = self.model.test(
[rank4]:   File "/home/ubuntu/VEnhancer/video_to_video/video_to_video_model_parallel.py", line 146, in test
[rank4]:     gen_video = self.tiled_chunked_decode(gen_vid)
[rank4]:   File "/home/ubuntu/VEnhancer/video_to_video/video_to_video_model_parallel.py", line 209, in tiled_chunked_decode
[rank4]:     tile = self.temporal_vae_decode(tile, tile_f_num)
[rank4]:   File "/home/ubuntu/VEnhancer/video_to_video/video_to_video_model_parallel.py", line 157, in temporal_vae_decode
[rank4]:     return self.vae.decode(z / self.vae.config.scaling_factor, num_frames=num_f).sample
[rank4]:   File "/home/ubuntu/.local/lib/python3.10/site-packages/diffusers/models/autoencoders/autoencoder_kl_temporal_decoder.py", line 366, in decode
[rank4]:     decoded = self.decoder(z, num_frames=num_frames, image_only_indicator=image_only_indicator)
[rank4]:   File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank4]:     return self._call_impl(*args, **kwargs)
[rank4]:   File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank4]:     return forward_call(*args, **kwargs)
[rank4]:   File "/home/ubuntu/.local/lib/python3.10/site-packages/diffusers/models/autoencoders/autoencoder_kl_temporal_decoder.py", line 147, in forward
[rank4]:     sample = up_block(sample, image_only_indicator=image_only_indicator)
[rank4]:   File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank4]:     return self._call_impl(*args, **kwargs)
[rank4]:   File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank4]:     return forward_call(*args, **kwargs)
[rank4]:   File "/home/ubuntu/.local/lib/python3.10/site-packages/diffusers/models/unets/unet_3d_blocks.py", line 1007, in forward
[rank4]:     hidden_states = upsampler(hidden_states)
[rank4]:   File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank4]:     return self._call_impl(*args, **kwargs)
[rank4]:   File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank4]:     return forward_call(*args, **kwargs)
[rank4]:   File "/home/ubuntu/.local/lib/python3.10/site-packages/diffusers/models/upsampling.py", line 180, in forward
[rank4]:     hidden_states = self.conv(hidden_states)
[rank4]:   File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank4]:     return self._call_impl(*args, **kwargs)
[rank4]:   File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank4]:     return forward_call(*args, **kwargs)
[rank4]:   File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 458, in forward
[rank4]:     return self._conv_forward(input, self.weight, self.bias)
[rank4]:   File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 454, in _conv_forward
[rank4]:     return F.conv2d(input, weight, bias, self.stride,
[rank4]: torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 720.00 MiB. GPU 4 has a total capacity of 39.39 GiB of which 619.94 MiB is free. Including non-PyTorch memory, this process has 38.78 GiB memory in use. Of the allocated memory 29.64 GiB is allocated by PyTorch, and 1.63 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
[rank3]: Traceback (most recent call last):

why is it taking up more GPU memory right as it finished? How can I fix this?

I am assuming i have to add "torch.cuda.empty_cache()" but we are unsure where.

Thanks for all your help!

hejingwenhejingwen commented 1 month ago

The OOM comes from temporal VAE decoding, its parallel inference is not supported now. But you can reduce the memory by modifying the tile size through VEnhancer/video_to_video/video_to_video_model.py line172~174

SamitM1 commented 1 month ago

@hejingwenhejingwen would love your recommendation for how much to reduce tile size for our particular case as we have a 40gb memory per gpu.

hejingwenhejingwen commented 1 month ago

Please try self.frame_chunk_size = 3 self.tile_img_height = 576 self.tile_img_width = 768. We cannot give a definitive answer at this time. We will work on it later.

SamitM1 commented 1 month ago

@hejingwenhejingwen I took your suggestion and updated video_to_video_model_parallel.py(not video_to_video_model.py since i am using 8 gpus)

I tried everything:

including reducing the batch size(which causes memory issues earlier on)
i reduced the tile size even more to self.frame_chunk_size = 2 self.tile_img_height = 385 self.tile_img_width =512.

but it still failed:

2024-09-17 19:47:53,265 - video_to_video.modules.unet_v2v_parallel - INFO - mask_cond shape: torch.Size([1, 31])
2024-09-17 19:47:53,265 - video_to_video.modules.unet_v2v_parallel - INFO - s_cond shape: torch.Size([1])
2024-09-17 19:47:53,265 - video_to_video.modules.unet_v2v_parallel - INFO - complete f: 31
2024-09-17 19:47:55,166 - video_to_video.modules.unet_v2v_parallel - INFO - complete input shape: torch.Size([1, 4, 31, 170, 248])
2024-09-17 19:47:55,167 - video_to_video.modules.unet_v2v_parallel - INFO - sharded input shape: torch.Size([1, 4, 4, 170, 248])
2024-09-17 19:47:55,167 - video_to_video.modules.unet_v2v_parallel - INFO - hint shape: torch.Size([1, 4, 49, 170, 248])
2024-09-17 19:47:55,167 - video_to_video.modules.unet_v2v_parallel - INFO - t_hint shape: torch.Size([1])
2024-09-17 19:47:55,167 - video_to_video.modules.unet_v2v_parallel - INFO - mask_cond shape: torch.Size([1, 31])
2024-09-17 19:47:55,167 - video_to_video.modules.unet_v2v_parallel - INFO - s_cond shape: torch.Size([1])
2024-09-17 19:47:55,167 - video_to_video.modules.unet_v2v_parallel - INFO - complete f: 31
2024-09-17 19:47:57,070 - video_to_video.modules.unet_v2v_parallel - INFO - complete input shape: torch.Size([1, 4, 40, 170, 248])
2024-09-17 19:47:57,070 - video_to_video.modules.unet_v2v_parallel - INFO - sharded input shape: torch.Size([1, 4, 5, 170, 248])
2024-09-17 19:47:57,070 - video_to_video.modules.unet_v2v_parallel - INFO - hint shape: torch.Size([1, 4, 49, 170, 248])
2024-09-17 19:47:57,070 - video_to_video.modules.unet_v2v_parallel - INFO - t_hint shape: torch.Size([1])
2024-09-17 19:47:57,070 - video_to_video.modules.unet_v2v_parallel - INFO - mask_cond shape: torch.Size([1, 40])
2024-09-17 19:47:57,070 - video_to_video.modules.unet_v2v_parallel - INFO - s_cond shape: torch.Size([1])
2024-09-17 19:47:57,070 - video_to_video.modules.unet_v2v_parallel - INFO - complete f: 40
2024-09-17 19:47:59,310 - video_to_video.modules.unet_v2v_parallel - INFO - complete input shape: torch.Size([1, 4, 40, 170, 248])
2024-09-17 19:47:59,310 - video_to_video.modules.unet_v2v_parallel - INFO - sharded input shape: torch.Size([1, 4, 5, 170, 248])
2024-09-17 19:47:59,310 - video_to_video.modules.unet_v2v_parallel - INFO - hint shape: torch.Size([1, 4, 49, 170, 248])
2024-09-17 19:47:59,310 - video_to_video.modules.unet_v2v_parallel - INFO - t_hint shape: torch.Size([1])
2024-09-17 19:47:59,310 - video_to_video.modules.unet_v2v_parallel - INFO - mask_cond shape: torch.Size([1, 40])
2024-09-17 19:47:59,310 - video_to_video.modules.unet_v2v_parallel - INFO - s_cond shape: torch.Size([1])
2024-09-17 19:47:59,310 - video_to_video.modules.unet_v2v_parallel - INFO - complete f: 40
2024-09-17 19:48:01,460 - video_to_video - INFO - sampling, finished.
2024-09-17 19:48:01,461 - video_to_video - INFO - sampling, finished.
2024-09-17 19:48:01,461 - video_to_video - INFO - sampling, finished.
2024-09-17 19:48:01,461 - video_to_video - INFO - sampling, finished.
2024-09-17 19:48:01,461 - video_to_video - INFO - sampling, finished.
2024-09-17 19:48:01,461 - video_to_video - INFO - sampling, finished.
2024-09-17 19:48:01,461 - video_to_video - INFO - sampling, finished.
2024-09-17 19:48:01,683 - video_to_video - INFO - sampling, finished.
[rank1]: Traceback (most recent call last):
[rank1]:   File "/home/ubuntu/VEnhancer/enhance_a_video_MultiGPU.py", line 215, in <module>
[rank1]:     main()
[rank1]:   File "/home/ubuntu/VEnhancer/enhance_a_video_MultiGPU.py", line 211, in main
[rank1]:     venhancer.enhance_a_video(file_path, prompt, up_scale, target_fps, noise_aug)
[rank1]:   File "/home/ubuntu/VEnhancer/enhance_a_video_MultiGPU.py", line 89, in enhance_a_video
[rank1]:     output = self.model.test(
[rank1]:   File "/home/ubuntu/VEnhancer/video_to_video/video_to_video_model_parallel.py", line 146, in test
[rank1]:     gen_video = self.tiled_chunked_decode(gen_vid)
[rank1]:   File "/home/ubuntu/VEnhancer/video_to_video/video_to_video_model_parallel.py", line 209, in tiled_chunked_decode
[rank1]:     tile = self.temporal_vae_decode(tile, tile_f_num)
[rank1]:   File "/home/ubuntu/VEnhancer/video_to_video/video_to_video_model_parallel.py", line 157, in temporal_vae_decode
[rank1]:     return self.vae.decode(z / self.vae.config.scaling_factor, num_frames=num_f).sample
[rank1]:   File "/home/ubuntu/.local/lib/python3.10/site-packages/diffusers/models/autoencoders/autoencoder_kl_temporal_decoder.py", line 366, in decode
[rank1]:     decoded = self.decoder(z, num_frames=num_frames, image_only_indicator=image_only_indicator)
[rank1]:   File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank1]:     return self._call_impl(*args, **kwargs)
[rank1]:   File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank1]:     return forward_call(*args, **kwargs)
[rank1]:   File "/home/ubuntu/.local/lib/python3.10/site-packages/diffusers/models/autoencoders/autoencoder_kl_temporal_decoder.py", line 147, in forward
[rank1]:     sample = up_block(sample, image_only_indicator=image_only_indicator)
[rank1]:   File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank1]:     return self._call_impl(*args, **kwargs)
[rank1]:   File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank1]:     return forward_call(*args, **kwargs)
[rank1]:   File "/home/ubuntu/.local/lib/python3.10/site-packages/diffusers/models/unets/unet_3d_blocks.py", line 1007, in forward
[rank1]:     hidden_states = upsampler(hidden_states)
[rank1]:   File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank1]:     return self._call_impl(*args, **kwargs)
[rank1]:   File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank1]:     return forward_call(*args, **kwargs)
[rank1]:   File "/home/ubuntu/.local/lib/python3.10/site-packages/diffusers/models/upsampling.py", line 180, in forward
[rank1]:     hidden_states = self.conv(hidden_states)
[rank1]:   File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank1]:     return self._call_impl(*args, **kwargs)
[rank1]:   File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank1]:     return forward_call(*args, **kwargs)
[rank1]:   File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 458, in forward
[rank1]:     return self._conv_forward(input, self.weight, self.bias)
[rank1]:   File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 454, in _conv_forward
[rank1]:     return F.conv2d(input, weight, bias, self.stride,
[rank1]: torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 324.00 MiB. GPU 1 has a total capacity of 39.39 GiB of which 263.94 MiB is free. Including non-PyTorch memory, this process has 39.12 GiB memory in use. Of the allocated memory 30.26 GiB is allocated by PyTorch, and 1.36 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
[rank5]: Traceback (most recent call last):
[rank5]:   File "/home/ubuntu/VEnhancer/enhance_a_video_MultiGPU.py", line 215, in <module>

hejingwenhejingwen commented 1 month ago

You can make chunks of all latents before you pass them to the VAE decoder. After you finish one chunk, put them to cpu.

Vchitect / VEnhancer

Approximate release date of MULTI-GPU inferencing #11