t2v CogVideoX1.5-5B OOM

LettleCreator commented 18 hours ago

System Info / 系統信息

CUDA12.4 diffusers 0.32.0.dev0 (使用pi p install -e . 安装的最新的) A100 40GB VRAM

运行CogVideoX1.5-5B-I2V进行I2V正常生成运行CogVideoX1.5-5B进行T2V，总是OOM

Information / 问题信息

[X] The official example scripts / 官方的示例脚本
[ ] My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

python inference/cli_demo.py --prompt="Two kittens lick each other's fur" --generate_type="t2v"

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 5.67 GiB. GPU 0 has a total capacity of 39.38 GiB of which 4.50 GiB is free. Process 168833 has 34.88 GiB memory in use. Of the allocated memory 31.49 GiB is allocated by PyTorch, and 2.89 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Expected behavior / 期待表现

直接运行的官方的nference/cli_demo.py，无修改里面是开启的 pipe.enable_sequential_cpu_offload() enabled. 依旧会OOM

zRzRzRzRzRzRzR commented 17 hours ago

更新到最新的diffusers main分支

LettleCreator commented 16 hours ago

更新到最新的diffusers main分支

已经更新到最新的diffusers main，还是存在OOM

LettleCreator commented 16 hours ago

(cogvideo) root@autodl-container-cd46119efa-b92bcf86:~/autodl-tmp/CogVideo# cd diffusers/ (cogvideo) root@autodl-container-cd46119efa-b92bcf86:~/autodl-tmp/CogVideo/diffusers# git checkout main Already on 'main' Your branch is up to date with 'origin/main'. (cogvideo) root@autodl-container-cd46119efa-b92bcf86:~/autodl-tmp/CogVideo/diffusers# git pull Already up to date. (cogvideo) root@autodl-container-cd46119efa-b92bcf86:~/autodl-tmp/CogVideo/diffusers# pip install -e .

Successfully built diffusers Installing collected packages: diffusers Attempting uninstall: diffusers Found existing installation: diffusers 0.32.0.dev0 Uninstalling diffusers-0.32.0.dev0: Successfully uninstalled diffusers-0.32.0.dev0 Successfully installed diffusers-0.32.0.dev0

WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. python inference/cli_demo.py --prompt="Two kittens lick each other's fur" Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:32<00:00, 8.05s/it] Loading pipeline components...: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:33<00:00, 6.77s/it] 0%| | 0/50 [00:01<?, ?it/s] Traceback (most recent call last): File "/root/autodl-tmp/CogVideo/inference/cli_demo.py", line 179, in generate_video( File "/root/autodl-tmp/CogVideo/inference/cli_demo.py", line 128, in generate_video video_generate = pipe( File "/root/autodl-tmp/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, kwargs) File "/root/autodl-tmp/CogVideo/diffusers/src/diffusers/pipelines/cogvideo/pipeline_cogvideox.py", line 710, in call noise_pred = self.transformer( File "/root/autodl-tmp/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/root/autodl-tmp/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(args, kwargs) File "/root/autodl-tmp/miniconda3/envs/cogvideo/lib/python3.10/site-packages/accelerate/hooks.py", line 170, in new_forward output = module._old_forward(*args, kwargs) File "/root/autodl-tmp/CogVideo/diffusers/src/diffusers/models/transformers/cogvideox_transformer_3d.py", line 503, in forward hidden_states, encoder_hidden_states = block( File "/root/autodl-tmp/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/root/autodl-tmp/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(args, kwargs) File "/root/autodl-tmp/CogVideo/diffusers/src/diffusers/models/transformers/cogvideox_transformer_3d.py", line 132, in forward attn_hidden_states, attn_encoder_hidden_states = self.attn1( File "/root/autodl-tmp/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/root/autodl-tmp/miniconda3/envs/cogvideo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(args, *kwargs) File "/root/autodl-tmp/CogVideo/diffusers/src/diffusers/models/attention_processor.py", line 530, in forward return self.processor( File "/root/autodl-tmp/CogVideo/diffusers/src/diffusers/models/attention_processor.py", line 2295, in call key[:, :, text_seq_length:] = apply_rotary_emb(key[:, :, text_seq_length:], image_rotary_emb) File "/root/autodl-tmp/CogVideo/diffusers/src/diffusers/models/embeddings.py", line 816, in apply_rotary_emb out = (x.float() cos + x_rotated.float() * sin).to(x.dtype) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 5.67 GiB. GPU 0 has a total capacity of 39.38 GiB of which 4.50 GiB is free. Process 225552 has 34.88 GiB memory in use. Of the allocated memory 31.49 GiB is allocated by PyTorch, and 2.89 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

eryueweiyu commented 13 hours ago

diffusers 0.32.0.dev0 torch 2.5.1+cu124 torchaudio 2.5.1+cu124 torchvision 0.20.1+cu124 transformers 4.46.3 torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 5.67 GiB. GPU 0 has a total capacity of 11.99 GiB of which 0 bytes is free. Of the allocated memory 26.15 GiB is allocated by PyTorch, and 153.84 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

THUDM / CogVideo