torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 400.00 GiB

Go1denMelody commented 3 months ago

When I run video super resolution model, there is an error torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 400.00 GiB (GPU 0; 44.52 GiB total capacity; 12.21 GiB already allocated; 31.33 GiB free; 12.83 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Why it need to try to allocate 400gb, should i change some settings?

Go1denMelody commented 3 months ago

Traceback (most recent call last): File "/home/powerop/work/LaVie/vsr/sample.py", line 151, in main(OmegaConf.load(args.config)) File "/home/powerop/work/LaVie/vsr/sample.py", line 109, in main upscaledvideo = pipeline( File "/home/powerop/work/LaVie/myenv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "/home/powerop/work/LaVie/vsr/models/pipeline_stable_diffusion_upscale_video3d.py", line 766, in call image = self.decode_latents_vsr(latents[start_f:end_f]) File "/home/powerop/work/LaVie/vsr/models/pipeline_stable_diffusion_upscale_video_3d.py", line 356, in decode_latents_vsr image = self.vae.decode(latents).sample File "/home/powerop/work/LaVie/myenv/lib/python3.10/site-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper return method(self, *args, *kwargs) File "/home/powerop/work/LaVie/vsr/models/autoencoder_kl.py", line 197, in decode decoded = self._decode(z).sample File "/home/powerop/work/LaVie/vsr/models/autoencoder_kl.py", line 184, in _decode dec = self.decoder(z) File "/home/powerop/work/LaVie/myenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/powerop/work/LaVie/myenv/lib/python3.10/site-packages/diffusers/models/vae.py", line 233, in forward sample = self.mid_block(sample) File "/home/powerop/work/LaVie/myenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/powerop/work/LaVie/myenv/lib/python3.10/site-packages/diffusers/models/unet_2d_blocks.py", line 463, in forward hidden_states = attn(hidden_states) File "/home/powerop/work/LaVie/myenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, **kwargs) File "/home/powerop/work/LaVie/myenv/lib/python3.10/site-packages/diffusers/models/attention.py", line 162, in forward hidden_states = F.scaled_dot_product_attention( torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 400.00 GiB (GPU 0; 44.52 GiB total capacity; 12.21 GiB already allocated; 31.35 GiB free; 12.80 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Go1denMelody commented 3 months ago

After checking, I found there is something wrong about xformers when I install requirements. After fixing it, the code can run successfully

johndpope commented 1 month ago

close

Vchitect / LaVie

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 400.00 GiB #64