Vchitect / LaVie

[IJCV 2024] LaVie: High-Quality Video Generation with Cascaded Latent Diffusion Models
Apache License 2.0
866 stars 59 forks source link

Running step2 with: torch.cuda.OutOfMemoryError: CUDA out of memory #48

Open tianqingyu opened 11 months ago

tianqingyu commented 11 months ago

My video card is rtx4090, 24G VRAM System is ubuntu 22

Here is the error message: args.input_path = ../results/base/a_panda_taking_a_selfie,_2k,_high_quality.mp4 args.prompt = ['a_panda_taking_a_selfie,_2k,_high_quality'] loading video from ../results/base/a_panda_taking_a_selfie,_2k,_high_quality.mp4 Traceback (most recent call last): File "/home/vantage/apps/vchitect-lavie/interpolation/sample.py", line 307, in main(OmegaConf.load(args.config)) File "/home/vantage/apps/vchitect-lavie/interpolation/sample.py", line 279, in main video_clip = auto_inpainting_copy_no_mask(args, video_input, prompt, vae, text_encoder, diffusion, model, device,) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/vantage/apps/vchitect-lavie/interpolation/sample.py", line 142, in auto_inpainting_copy_no_mask video_input = vae.encode(video_input).latentdist.sample().mul(0.18215) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper return method(self, *args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/diffusers/models/autoencoder_kl.py", line 164, in encode h = self.encoder(x) ^^^^^^^^^^^^^^^ File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/diffusers/models/vae.py", line 129, in forward sample = down_block(sample) ^^^^^^^^^^^^^^^^^^ File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/diffusers/models/unet_2d_blocks.py", line 1014, in forward hidden_states = resnet(hidden_states, temb=None) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/diffusers/models/resnet.py", line 599, in forward output_tensor = (input_tensor + hidden_states) / self.output_scale_factor


torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.77 GiB (GPU 0; 23.65 GiB total capacity; 18.59 GiB already allocated; 4.41 GiB free; 18.71 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

---------
After testing, except for the memory overflow when running interpolation, both base and vsr can run normally.
maxin-cn commented 11 months ago

My video card is rtx4090, 24G VRAM System is ubuntu 22

Here is the error message: args.input_path = ../results/base/a_panda_taking_a_selfie,_2k,_high_quality.mp4 args.prompt = ['a_panda_taking_a_selfie,_2k,_high_quality'] loading video from ../results/base/a_panda_taking_a_selfie,_2k,_high_quality.mp4 Traceback (most recent call last): File "/home/vantage/apps/vchitect-lavie/interpolation/sample.py", line 307, in main(OmegaConf.load(args.config)) File "/home/vantage/apps/vchitect-lavie/interpolation/sample.py", line 279, in main video_clip = auto_inpainting_copy_no_mask(args, video_input, prompt, vae, text_encoder, diffusion, model, device,) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/vantage/apps/vchitect-lavie/interpolation/sample.py", line 142, in auto_inpainting_copy_no_mask video_input = vae.encode(video_input).latentdist.sample().mul(0.18215) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper return method(self, *args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/diffusers/models/autoencoder_kl.py", line 164, in encode h = self.encoder(x) ^^^^^^^^^^^^^^^ File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/diffusers/models/vae.py", line 129, in forward sample = down_block(sample) ^^^^^^^^^^^^^^^^^^ File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/diffusers/models/unet_2d_blocks.py", line 1014, in forward hidden_states = resnet(hidden_states, temb=None) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/diffusers/models/resnet.py", line 599, in forward output_tensor = (input_tensor + hidden_states) / self.output_scale_factor ~~~~~~~^~~~~~ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.77 GiB (GPU 0; 23.65 GiB total capacity; 18.59 GiB already allocated; 4.41 GiB free; 18.71 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

After testing, except for the memory overflow when running interpolation, both base and vsr can run normally.

@tianqingyu Hi, currently, only base supports half-precision sampling. We will support half-precision for vsr and interpolation in the future. I believe half-precision running interpolation will work on your machine. BTW, any PR is welcome.

jackyin68 commented 11 months ago

Then, how many GPU satisfied? @tianqingyu

tianqingyu commented 11 months ago

Then, how many GPU satisfied? @tianqingyu

My server has 4 video cards (4090), but I don't have the ability to change the code to GPU parallelism at the moment. I think 4 GPUs with 24Gx4 should be enough

maxin-cn commented 11 months ago

Then, how many GPU satisfied? @tianqingyu

My server has 4 video cards (4090), but I don't have the ability to change the code to GPU parallelism at the moment. I think 4 GPUs with 24Gx4 should be enough

@tianqingyu Hi, I suggest you modify the interpolation to half-precision sampling according to the half-precision test code in base. I think half-precision interpolation sampling can be successfully run. Or, you can wait until we support interpolation half-precision sampling.

Ednaordinary commented 6 months ago

Or, you can wait until we support interpolation half-precision sampling. @maxin-cn

Do you still plan to do this? I'm a bit confused here. Is half precision referring to loading the model and latents in float16? I'm also a bit confused why the returned latents from the base are in [1, 4, 16, 40, 64] but the interpolation model is in [2, 4, 61, 64, 40]. Is there a reason for switching the height and width, cutting 3 of the frames, and concatenating it to itself?

edit: Also looks like the original issue is in the vae call. Try adding vae.enable_slicing() after its declaration. Might still run out of memory but you may get past the encode call