Vchitect / Latte

Latte: Latent Diffusion Transformer for Video Generation.
Apache License 2.0
1.44k stars 147 forks source link

CUDA out of memory #64

Open likeatingcake opened 3 months ago

likeatingcake commented 3 months ago

(latte) yueyc@super-AS-4124GS-TNR:~/Latte$ bash sample/ffs_ddp.sh WARNING:torch.distributed.run:


Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.


Using Ema! WARNING: using half percision for inferencing! Using Ema! WARNING: using half percision for inferencing! Saving .mp4 samples at ./test 第一个问题:当我尝试运行 bash sample/ffs_ddp.sh这个指令的时候,一直运行到Saving .mp4 samples at ./test这一步,之后就好像卡在了这一步,一直都没有下文,需要手动终止,它才会停止,也就是我无法实现ddp采样

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:03<00:00, 1.08it/s] Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:04<00:00, 1.13s/it] Processing the (Yellow and black tropical fish dart through the sea.) prompt Processing the (Yellow and black tropical fish dart through the sea.) prompt 0%| | 0/50 [00:01<?, ?it/s] Traceback (most recent call last): File "/home/yueyc/Latte/sample/sample_t2v.py", line 120, in main(OmegaConf.load(args.config)) File "/home/yueyc/Latte/sample/sample_t2v.py", line 86, in main videos = videogen_pipeline(prompt, ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/sample/pipeline_videogen.py", line 706, in call noise_pred = self.transformer( ^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/models/latte_t2v.py", line 773, in forward hidden_states = self.pos_embed(hidden_states) # alrady add positional embeddings ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/diffusers/models/embeddings.py", line 187, in forward return (latent + pos_embed).to(latent.dtype) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.50 GiB (GPU 0; 47.54 GiB total capacity; 26.67 GiB already allocated; 60.81 MiB free; 26.96 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF 0%| | 0/50 [00:02<?, ?it/s] Traceback (most recent call last): File "/home/yueyc/Latte/sample/sample_t2v.py", line 120, in main(OmegaConf.load(args.config)) File "/home/yueyc/Latte/sample/sample_t2v.py", line 86, in main videos = videogen_pipeline(prompt, ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/sample/pipeline_videogen.py", line 706, in call noise_pred = self.transformer( ^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/models/latte_t2v.py", line 773, in forward hidden_states = self.pos_embed(hidden_states) # alrady add positional embeddings ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/diffusers/models/embeddings.py", line 187, in forward return (latent + pos_embed).to(latent.dtype)


torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 9.00 GiB (GPU 0; 47.54 GiB total capacity; 17.67 GiB already allocated; 76.81 MiB free; 17.96 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 68022) of binary: /home/yueyc/anaconda3/envs/latte/bin/python
**第二个问题,**当我尝试运行bash sample/t2v.sh脚本,总是显示GPU的内存不足,于是我采用多张GPU卡,无论是2张卡,还是使用全部的GPU卡,都存在这个问题CUDA out of memory,按道理我们实验室的GPU卡内存肯定是够的,那为什么我无论使用几张GPU卡都显示内存不足啊,调试了一天,不断查看GPU卡的内存到底够不够,不断的换卡,这个问题总是无法解决。求作者解答,十分感谢。
maxin-cn commented 3 months ago

(latte) yueyc@super-AS-4124GS-TNR:~/Latte$ bash sample/ffs_ddp.sh WARNING:torch.distributed.run:

Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.

Using Ema! WARNING: using half percision for inferencing! Using Ema! WARNING: using half percision for inferencing! Saving .mp4 samples at ./test 第一个问题:当我尝试运行 bash sample/ffs_ddp.sh这个指令的时候,一直运行到Saving .mp4 samples at ./test这一步,之后就好像卡在了这一步,一直都没有下文,需要手动终止,它才会停止,也就是我无法实现ddp采样

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:03<00:00, 1.08it/s] Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:04<00:00, 1.13s/it] Processing the (Yellow and black tropical fish dart through the sea.) prompt Processing the (Yellow and black tropical fish dart through the sea.) prompt 0%| | 0/50 [00:01<?, ?it/s] Traceback (most recent call last): File "/home/yueyc/Latte/sample/sample_t2v.py", line 120, in main(OmegaConf.load(args.config)) File "/home/yueyc/Latte/sample/sample_t2v.py", line 86, in main videos = videogen_pipeline(prompt, ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/sample/pipeline_videogen.py", line 706, in call noise_pred = self.transformer( ^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/models/latte_t2v.py", line 773, in forward hidden_states = self.pos_embed(hidden_states) # alrady add positional embeddings ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/diffusers/models/embeddings.py", line 187, in forward return (latent + pos_embed).to(latent.dtype) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.50 GiB (GPU 0; 47.54 GiB total capacity; 26.67 GiB already allocated; 60.81 MiB free; 26.96 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF 0%| | 0/50 [00:02<?, ?it/s] Traceback (most recent call last): File "/home/yueyc/Latte/sample/sample_t2v.py", line 120, in main(OmegaConf.load(args.config)) File "/home/yueyc/Latte/sample/sample_t2v.py", line 86, in main videos = videogen_pipeline(prompt, ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/sample/pipeline_videogen.py", line 706, in call noise_pred = self.transformer( ^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/models/latte_t2v.py", line 773, in forward hidden_states = self.pos_embed(hidden_states) # alrady add positional embeddings ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/diffusers/models/embeddings.py", line 187, in forward return (latent + pos_embed).to(latent.dtype) ~^~~~~ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 9.00 GiB (GPU 0; 47.54 GiB total capacity; 17.67 GiB already allocated; 76.81 MiB free; 17.96 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 68022) of binary: /home/yueyc/anaconda3/envs/latte/bin/python 第二个问题,当我尝试运行bash sample/t2v.sh脚本,总是显示GPU的内存不足,于是我采用多张GPU卡,无论是2张卡,还是使用全部的GPU卡,都存在这个问题CUDA out of memory,按道理我们实验室的GPU卡内存肯定是够的,那为什么我无论使用几张GPU卡都显示内存不足啊,调试了一天,不断查看GPU卡的内存到底够不够,不断的换卡,这个问题总是无法解决。求作者解答,十分感谢。

Thanks for your interest.

  1. Check the storage path to see if there are saved videos, if so, it is normal.
  2. Inferencing one video on the A100 requires 20916MiB of GPU memory under fp16 precision mode (t2v). Multi-GPUs mode can not reduce the temporary use of video memory when inferencing.
likeatingcake commented 3 months ago

(latte) yueyc@super-AS-4124GS-TNR:~/Latte$ bash sample/ffs_ddp.sh WARNING:torch.distributed.run: Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. Using Ema! WARNING: using half percision for inferencing! Using Ema! WARNING: using half percision for inferencing! Saving .mp4 samples at ./test 第一个问题:当我尝试运行 bash sample/ffs_ddp.sh这个指令的时候,一直运行到Saving .mp4 samples at ./test这一步,之后就好像卡在了这一步,一直都没有下文,需要手动终止,它才会停止,也就是我无法实现ddp采样 Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:03<00:00, 1.08it/s] Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:04<00:00, 1.13s/it] Processing the (Yellow and black tropical fish dart through the sea.) prompt Processing the (Yellow and black tropical fish dart through the sea.) prompt 0%| | 0/50 [00:01<?, ?it/s] Traceback (most recent call last): File "/home/yueyc/Latte/sample/sample_t2v.py", line 120, in main(OmegaConf.load(args.config)) File "/home/yueyc/Latte/sample/sample_t2v.py", line 86, in main videos = videogen_pipeline(prompt, ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/sample/pipeline_videogen.py", line 706, in call noise_pred = self.transformer( ^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/models/latte_t2v.py", line 773, in forward hidden_states = self.pos_embed(hidden_states) # alrady add positional embeddings ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/diffusers/models/embeddings.py", line 187, in forward return (latent + pos_embed).to(latent.dtype) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.50 GiB (GPU 0; 47.54 GiB total capacity; 26.67 GiB already allocated; 60.81 MiB free; 26.96 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF 0%| | 0/50 [00:02<?, ?it/s] Traceback (most recent call last): File "/home/yueyc/Latte/sample/sample_t2v.py", line 120, in main(OmegaConf.load(args.config)) File "/home/yueyc/Latte/sample/sample_t2v.py", line 86, in main videos = videogen_pipeline(prompt, ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/sample/pipeline_videogen.py", line 706, in call noise_pred = self.transformer( ^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/models/latte_t2v.py", line 773, in forward hidden_states = self.pos_embed(hidden_states) # alrady add positional embeddings ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/diffusers/models/embeddings.py", line 187, in forward return (latent + pos_embed).to(latent.dtype) ~^~~~~ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 9.00 GiB (GPU 0; 47.54 GiB total capacity; 17.67 GiB already allocated; 76.81 MiB free; 17.96 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 68022) of binary: /home/yueyc/anaconda3/envs/latte/bin/python 第二个问题,当我尝试运行bash sample/t2v.sh脚本,总是显示GPU的内存不足,于是我采用多张GPU卡,无论是2张卡,还是使用全部的GPU卡,都存在这个问题CUDA out of memory,按道理我们实验室的GPU卡内存肯定是够的,那为什么我无论使用几张GPU卡都显示内存不足啊,调试了一天,不断查看GPU卡的内存到底够不够,不断的换卡,这个问题总是无法解决。求作者解答,十分感谢。

Thanks for your interest.

  1. Check the storage path to see if there are saved videos, if so, it is normal.
  2. Inferencing one video on the A100 requires 20916MiB of GPU memory under fp16 precision mode (t2v). Multi-GPUs mode can not reduce the temporary use of video memory when inferencing.

你好,生成的视频并没有在对应文件夹中找到,好像一直在处理

maxin-cn commented 3 months ago

(latte) yueyc@super-AS-4124GS-TNR:~/Latte$ bash sample/ffs_ddp.sh WARNING:torch.distributed.run: Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. Using Ema! WARNING: using half percision for inferencing! Using Ema! WARNING: using half percision for inferencing! Saving .mp4 samples at ./test 第一个问题:当我尝试运行 bash sample/ffs_ddp.sh这个指令的时候,一直运行到Saving .mp4 samples at ./test这一步,之后就好像卡在了这一步,一直都没有下文,需要手动终止,它才会停止,也就是我无法实现ddp采样 Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:03<00:00, 1.08it/s] Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:04<00:00, 1.13s/it] Processing the (Yellow and black tropical fish dart through the sea.) prompt Processing the (Yellow and black tropical fish dart through the sea.) prompt 0%| | 0/50 [00:01<?, ?it/s] Traceback (most recent call last): File "/home/yueyc/Latte/sample/sample_t2v.py", line 120, in main(OmegaConf.load(args.config)) File "/home/yueyc/Latte/sample/sample_t2v.py", line 86, in main videos = videogen_pipeline(prompt, ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/sample/pipeline_videogen.py", line 706, in call noise_pred = self.transformer( ^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/models/latte_t2v.py", line 773, in forward hidden_states = self.pos_embed(hidden_states) # alrady add positional embeddings ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/diffusers/models/embeddings.py", line 187, in forward return (latent + pos_embed).to(latent.dtype) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.50 GiB (GPU 0; 47.54 GiB total capacity; 26.67 GiB already allocated; 60.81 MiB free; 26.96 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF 0%| | 0/50 [00:02<?, ?it/s] Traceback (most recent call last): File "/home/yueyc/Latte/sample/sample_t2v.py", line 120, in main(OmegaConf.load(args.config)) File "/home/yueyc/Latte/sample/sample_t2v.py", line 86, in main videos = videogen_pipeline(prompt, ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/sample/pipeline_videogen.py", line 706, in call noise_pred = self.transformer( ^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/models/latte_t2v.py", line 773, in forward hidden_states = self.pos_embed(hidden_states) # alrady add positional embeddings ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/diffusers/models/embeddings.py", line 187, in forward return (latent + pos_embed).to(latent.dtype) ~^~~~~ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 9.00 GiB (GPU 0; 47.54 GiB total capacity; 17.67 GiB already allocated; 76.81 MiB free; 17.96 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 68022) of binary: /home/yueyc/anaconda3/envs/latte/bin/python 第二个问题,当我尝试运行bash sample/t2v.sh脚本,总是显示GPU的内存不足,于是我采用多张GPU卡,无论是2张卡,还是使用全部的GPU卡,都存在这个问题CUDA out of memory,按道理我们实验室的GPU卡内存肯定是够的,那为什么我无论使用几张GPU卡都显示内存不足啊,调试了一天,不断查看GPU卡的内存到底够不够,不断的换卡,这个问题总是无法解决。求作者解答,十分感谢。

Thanks for your interest.

  1. Check the storage path to see if there are saved videos, if so, it is normal.
  2. Inferencing one video on the A100 requires 20916MiB of GPU memory under fp16 precision mode (t2v). Multi-GPUs mode can not reduce the temporary use of video memory when inferencing.

你好,生成的视频并没有在对应文件夹中找到,好像一直在处理

Could you tell me what you set this parameter: https://github.com/Vchitect/Latte/blob/9fd35d552a450afcf3a177a7fe93d54e359e0cdc/configs/ffs/ffs_sample.yaml#L29

likeatingcake commented 3 months ago

(latte) yueyc@super-AS-4124GS-TNR:~/Latte$ bash sample/ffs_ddp.sh WARNING:torch.distributed.run: Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. Using Ema! WARNING: using half percision for inferencing! Using Ema! WARNING: using half percision for inferencing! Saving .mp4 samples at ./test 第一个问题:当我尝试运行 bash sample/ffs_ddp.sh这个指令的时候,一直运行到Saving .mp4 samples at ./test这一步,之后就好像卡在了这一步,一直都没有下文,需要手动终止,它才会停止,也就是我无法实现ddp采样 Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:03<00:00, 1.08it/s] Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:04<00:00, 1.13s/it] Processing the (Yellow and black tropical fish dart through the sea.) prompt Processing the (Yellow and black tropical fish dart through the sea.) prompt 0%| | 0/50 [00:01<?, ?it/s] Traceback (most recent call last): File "/home/yueyc/Latte/sample/sample_t2v.py", line 120, in main(OmegaConf.load(args.config)) File "/home/yueyc/Latte/sample/sample_t2v.py", line 86, in main videos = videogen_pipeline(prompt, ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/sample/pipeline_videogen.py", line 706, in call noise_pred = self.transformer( ^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/models/latte_t2v.py", line 773, in forward hidden_states = self.pos_embed(hidden_states) # alrady add positional embeddings ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/diffusers/models/embeddings.py", line 187, in forward return (latent + pos_embed).to(latent.dtype) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.50 GiB (GPU 0; 47.54 GiB total capacity; 26.67 GiB already allocated; 60.81 MiB free; 26.96 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF 0%| | 0/50 [00:02<?, ?it/s] Traceback (most recent call last): File "/home/yueyc/Latte/sample/sample_t2v.py", line 120, in main(OmegaConf.load(args.config)) File "/home/yueyc/Latte/sample/sample_t2v.py", line 86, in main videos = videogen_pipeline(prompt, ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/sample/pipeline_videogen.py", line 706, in call noise_pred = self.transformer( ^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/models/latte_t2v.py", line 773, in forward hidden_states = self.pos_embed(hidden_states) # alrady add positional embeddings ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/diffusers/models/embeddings.py", line 187, in forward return (latent + pos_embed).to(latent.dtype) ~^~~~~ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 9.00 GiB (GPU 0; 47.54 GiB total capacity; 17.67 GiB already allocated; 76.81 MiB free; 17.96 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 68022) of binary: /home/yueyc/anaconda3/envs/latte/bin/python 第二个问题,当我尝试运行bash sample/t2v.sh脚本,总是显示GPU的内存不足,于是我采用多张GPU卡,无论是2张卡,还是使用全部的GPU卡,都存在这个问题CUDA out of memory,按道理我们实验室的GPU卡内存肯定是够的,那为什么我无论使用几张GPU卡都显示内存不足啊,调试了一天,不断查看GPU卡的内存到底够不够,不断的换卡,这个问题总是无法解决。求作者解答,十分感谢。

Thanks for your interest.

  1. Check the storage path to see if there are saved videos, if so, it is normal.
  2. Inferencing one video on the A100 requires 20916MiB of GPU memory under fp16 precision mode (t2v). Multi-GPUs mode can not reduce the temporary use of video memory when inferencing.

你好,生成的视频并没有在对应文件夹中找到,好像一直在处理

Could you tell me what you set this parameter:

https://github.com/Vchitect/Latte/blob/9fd35d552a450afcf3a177a7fe93d54e359e0cdc/configs/ffs/ffs_sample.yaml#L29

ddp sample config

per_proc_batch_size: 2 num_fvd_samples: 2048 我没有修改配置文件的参数