Open likeatingcake opened 3 months ago
(latte) yueyc@super-AS-4124GS-TNR:~/Latte$ bash sample/ffs_ddp.sh WARNING:torch.distributed.run:
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
Using Ema! WARNING: using half percision for inferencing! Using Ema! WARNING: using half percision for inferencing! Saving .mp4 samples at ./test 第一个问题:当我尝试运行 bash sample/ffs_ddp.sh这个指令的时候,一直运行到Saving .mp4 samples at ./test这一步,之后就好像卡在了这一步,一直都没有下文,需要手动终止,它才会停止,也就是我无法实现ddp采样
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:03<00:00, 1.08it/s] Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:04<00:00, 1.13s/it] Processing the (Yellow and black tropical fish dart through the sea.) prompt Processing the (Yellow and black tropical fish dart through the sea.) prompt 0%| | 0/50 [00:01<?, ?it/s] Traceback (most recent call last): File "/home/yueyc/Latte/sample/sample_t2v.py", line 120, in main(OmegaConf.load(args.config)) File "/home/yueyc/Latte/sample/sample_t2v.py", line 86, in main videos = videogen_pipeline(prompt, ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/sample/pipeline_videogen.py", line 706, in call noise_pred = self.transformer( ^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/models/latte_t2v.py", line 773, in forward hidden_states = self.pos_embed(hidden_states) # alrady add positional embeddings ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/diffusers/models/embeddings.py", line 187, in forward return (latent + pos_embed).to(latent.dtype) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.50 GiB (GPU 0; 47.54 GiB total capacity; 26.67 GiB already allocated; 60.81 MiB free; 26.96 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF 0%| | 0/50 [00:02<?, ?it/s] Traceback (most recent call last): File "/home/yueyc/Latte/sample/sample_t2v.py", line 120, in main(OmegaConf.load(args.config)) File "/home/yueyc/Latte/sample/sample_t2v.py", line 86, in main videos = videogen_pipeline(prompt, ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/sample/pipeline_videogen.py", line 706, in call noise_pred = self.transformer( ^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/models/latte_t2v.py", line 773, in forward hidden_states = self.pos_embed(hidden_states) # alrady add positional embeddings ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/diffusers/models/embeddings.py", line 187, in forward return (latent + pos_embed).to(latent.dtype)
~^~~~~ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 9.00 GiB (GPU 0; 47.54 GiB total capacity; 17.67 GiB already allocated; 76.81 MiB free; 17.96 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 68022) of binary: /home/yueyc/anaconda3/envs/latte/bin/python 第二个问题,当我尝试运行bash sample/t2v.sh脚本,总是显示GPU的内存不足,于是我采用多张GPU卡,无论是2张卡,还是使用全部的GPU卡,都存在这个问题CUDA out of memory,按道理我们实验室的GPU卡内存肯定是够的,那为什么我无论使用几张GPU卡都显示内存不足啊,调试了一天,不断查看GPU卡的内存到底够不够,不断的换卡,这个问题总是无法解决。求作者解答,十分感谢。
Thanks for your interest.
(latte) yueyc@super-AS-4124GS-TNR:~/Latte$ bash sample/ffs_ddp.sh WARNING:torch.distributed.run: Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. Using Ema! WARNING: using half percision for inferencing! Using Ema! WARNING: using half percision for inferencing! Saving .mp4 samples at ./test 第一个问题:当我尝试运行 bash sample/ffs_ddp.sh这个指令的时候,一直运行到Saving .mp4 samples at ./test这一步,之后就好像卡在了这一步,一直都没有下文,需要手动终止,它才会停止,也就是我无法实现ddp采样 Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:03<00:00, 1.08it/s] Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:04<00:00, 1.13s/it] Processing the (Yellow and black tropical fish dart through the sea.) prompt Processing the (Yellow and black tropical fish dart through the sea.) prompt 0%| | 0/50 [00:01<?, ?it/s] Traceback (most recent call last): File "/home/yueyc/Latte/sample/sample_t2v.py", line 120, in main(OmegaConf.load(args.config)) File "/home/yueyc/Latte/sample/sample_t2v.py", line 86, in main videos = videogen_pipeline(prompt, ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/sample/pipeline_videogen.py", line 706, in call noise_pred = self.transformer( ^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/models/latte_t2v.py", line 773, in forward hidden_states = self.pos_embed(hidden_states) # alrady add positional embeddings ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/diffusers/models/embeddings.py", line 187, in forward return (latent + pos_embed).to(latent.dtype) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.50 GiB (GPU 0; 47.54 GiB total capacity; 26.67 GiB already allocated; 60.81 MiB free; 26.96 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF 0%| | 0/50 [00:02<?, ?it/s] Traceback (most recent call last): File "/home/yueyc/Latte/sample/sample_t2v.py", line 120, in main(OmegaConf.load(args.config)) File "/home/yueyc/Latte/sample/sample_t2v.py", line 86, in main videos = videogen_pipeline(prompt, ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/sample/pipeline_videogen.py", line 706, in call noise_pred = self.transformer( ^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/models/latte_t2v.py", line 773, in forward hidden_states = self.pos_embed(hidden_states) # alrady add positional embeddings ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/diffusers/models/embeddings.py", line 187, in forward return (latent + pos_embed).to(latent.dtype)
~^~~~~ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 9.00 GiB (GPU 0; 47.54 GiB total capacity; 17.67 GiB already allocated; 76.81 MiB free; 17.96 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 68022) of binary: /home/yueyc/anaconda3/envs/latte/bin/python 第二个问题,当我尝试运行bash sample/t2v.sh脚本,总是显示GPU的内存不足,于是我采用多张GPU卡,无论是2张卡,还是使用全部的GPU卡,都存在这个问题CUDA out of memory,按道理我们实验室的GPU卡内存肯定是够的,那为什么我无论使用几张GPU卡都显示内存不足啊,调试了一天,不断查看GPU卡的内存到底够不够,不断的换卡,这个问题总是无法解决。求作者解答,十分感谢。Thanks for your interest.
- Check the storage path to see if there are saved videos, if so, it is normal.
- Inferencing one video on the A100 requires 20916MiB of GPU memory under fp16 precision mode (t2v). Multi-GPUs mode can not reduce the temporary use of video memory when inferencing.
你好,生成的视频并没有在对应文件夹中找到,好像一直在处理
(latte) yueyc@super-AS-4124GS-TNR:~/Latte$ bash sample/ffs_ddp.sh WARNING:torch.distributed.run: Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. Using Ema! WARNING: using half percision for inferencing! Using Ema! WARNING: using half percision for inferencing! Saving .mp4 samples at ./test 第一个问题:当我尝试运行 bash sample/ffs_ddp.sh这个指令的时候,一直运行到Saving .mp4 samples at ./test这一步,之后就好像卡在了这一步,一直都没有下文,需要手动终止,它才会停止,也就是我无法实现ddp采样 Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:03<00:00, 1.08it/s] Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:04<00:00, 1.13s/it] Processing the (Yellow and black tropical fish dart through the sea.) prompt Processing the (Yellow and black tropical fish dart through the sea.) prompt 0%| | 0/50 [00:01<?, ?it/s] Traceback (most recent call last): File "/home/yueyc/Latte/sample/sample_t2v.py", line 120, in main(OmegaConf.load(args.config)) File "/home/yueyc/Latte/sample/sample_t2v.py", line 86, in main videos = videogen_pipeline(prompt, ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/sample/pipeline_videogen.py", line 706, in call noise_pred = self.transformer( ^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/models/latte_t2v.py", line 773, in forward hidden_states = self.pos_embed(hidden_states) # alrady add positional embeddings ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/diffusers/models/embeddings.py", line 187, in forward return (latent + pos_embed).to(latent.dtype) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.50 GiB (GPU 0; 47.54 GiB total capacity; 26.67 GiB already allocated; 60.81 MiB free; 26.96 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF 0%| | 0/50 [00:02<?, ?it/s] Traceback (most recent call last): File "/home/yueyc/Latte/sample/sample_t2v.py", line 120, in main(OmegaConf.load(args.config)) File "/home/yueyc/Latte/sample/sample_t2v.py", line 86, in main videos = videogen_pipeline(prompt, ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/sample/pipeline_videogen.py", line 706, in call noise_pred = self.transformer( ^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/models/latte_t2v.py", line 773, in forward hidden_states = self.pos_embed(hidden_states) # alrady add positional embeddings ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/diffusers/models/embeddings.py", line 187, in forward return (latent + pos_embed).to(latent.dtype)
~^~~~~ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 9.00 GiB (GPU 0; 47.54 GiB total capacity; 17.67 GiB already allocated; 76.81 MiB free; 17.96 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 68022) of binary: /home/yueyc/anaconda3/envs/latte/bin/python 第二个问题,当我尝试运行bash sample/t2v.sh脚本,总是显示GPU的内存不足,于是我采用多张GPU卡,无论是2张卡,还是使用全部的GPU卡,都存在这个问题CUDA out of memory,按道理我们实验室的GPU卡内存肯定是够的,那为什么我无论使用几张GPU卡都显示内存不足啊,调试了一天,不断查看GPU卡的内存到底够不够,不断的换卡,这个问题总是无法解决。求作者解答,十分感谢。Thanks for your interest.
- Check the storage path to see if there are saved videos, if so, it is normal.
- Inferencing one video on the A100 requires 20916MiB of GPU memory under fp16 precision mode (t2v). Multi-GPUs mode can not reduce the temporary use of video memory when inferencing.
你好,生成的视频并没有在对应文件夹中找到,好像一直在处理
Could you tell me what you set this parameter: https://github.com/Vchitect/Latte/blob/9fd35d552a450afcf3a177a7fe93d54e359e0cdc/configs/ffs/ffs_sample.yaml#L29
(latte) yueyc@super-AS-4124GS-TNR:~/Latte$ bash sample/ffs_ddp.sh WARNING:torch.distributed.run: Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. Using Ema! WARNING: using half percision for inferencing! Using Ema! WARNING: using half percision for inferencing! Saving .mp4 samples at ./test 第一个问题:当我尝试运行 bash sample/ffs_ddp.sh这个指令的时候,一直运行到Saving .mp4 samples at ./test这一步,之后就好像卡在了这一步,一直都没有下文,需要手动终止,它才会停止,也就是我无法实现ddp采样 Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:03<00:00, 1.08it/s] Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:04<00:00, 1.13s/it] Processing the (Yellow and black tropical fish dart through the sea.) prompt Processing the (Yellow and black tropical fish dart through the sea.) prompt 0%| | 0/50 [00:01<?, ?it/s] Traceback (most recent call last): File "/home/yueyc/Latte/sample/sample_t2v.py", line 120, in main(OmegaConf.load(args.config)) File "/home/yueyc/Latte/sample/sample_t2v.py", line 86, in main videos = videogen_pipeline(prompt, ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/sample/pipeline_videogen.py", line 706, in call noise_pred = self.transformer( ^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/models/latte_t2v.py", line 773, in forward hidden_states = self.pos_embed(hidden_states) # alrady add positional embeddings ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/diffusers/models/embeddings.py", line 187, in forward return (latent + pos_embed).to(latent.dtype) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.50 GiB (GPU 0; 47.54 GiB total capacity; 26.67 GiB already allocated; 60.81 MiB free; 26.96 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF 0%| | 0/50 [00:02<?, ?it/s] Traceback (most recent call last): File "/home/yueyc/Latte/sample/sample_t2v.py", line 120, in main(OmegaConf.load(args.config)) File "/home/yueyc/Latte/sample/sample_t2v.py", line 86, in main videos = videogen_pipeline(prompt, ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/sample/pipeline_videogen.py", line 706, in call noise_pred = self.transformer( ^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/Latte/models/latte_t2v.py", line 773, in forward hidden_states = self.pos_embed(hidden_states) # alrady add positional embeddings ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/diffusers/models/embeddings.py", line 187, in forward return (latent + pos_embed).to(latent.dtype)
~^~~~~ torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 9.00 GiB (GPU 0; 47.54 GiB total capacity; 17.67 GiB already allocated; 76.81 MiB free; 17.96 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 68022) of binary: /home/yueyc/anaconda3/envs/latte/bin/python 第二个问题,当我尝试运行bash sample/t2v.sh脚本,总是显示GPU的内存不足,于是我采用多张GPU卡,无论是2张卡,还是使用全部的GPU卡,都存在这个问题CUDA out of memory,按道理我们实验室的GPU卡内存肯定是够的,那为什么我无论使用几张GPU卡都显示内存不足啊,调试了一天,不断查看GPU卡的内存到底够不够,不断的换卡,这个问题总是无法解决。求作者解答,十分感谢。Thanks for your interest.
- Check the storage path to see if there are saved videos, if so, it is normal.
- Inferencing one video on the A100 requires 20916MiB of GPU memory under fp16 precision mode (t2v). Multi-GPUs mode can not reduce the temporary use of video memory when inferencing.
你好,生成的视频并没有在对应文件夹中找到,好像一直在处理
Could you tell me what you set this parameter:
per_proc_batch_size: 2 num_fvd_samples: 2048 我没有修改配置文件的参数
(latte) yueyc@super-AS-4124GS-TNR:~/Latte$ bash sample/ffs_ddp.sh WARNING:torch.distributed.run:
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
Using Ema! WARNING: using half percision for inferencing! Using Ema! WARNING: using half percision for inferencing! Saving .mp4 samples at ./test 第一个问题:当我尝试运行 bash sample/ffs_ddp.sh这个指令的时候,一直运行到Saving .mp4 samples at ./test这一步,之后就好像卡在了这一步,一直都没有下文,需要手动终止,它才会停止,也就是我无法实现ddp采样
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:03<00:00, 1.08it/s] Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:04<00:00, 1.13s/it] Processing the (Yellow and black tropical fish dart through the sea.) prompt Processing the (Yellow and black tropical fish dart through the sea.) prompt 0%| | 0/50 [00:01<?, ?it/s] Traceback (most recent call last): File "/home/yueyc/Latte/sample/sample_t2v.py", line 120, in
main(OmegaConf.load(args.config))
File "/home/yueyc/Latte/sample/sample_t2v.py", line 86, in main
videos = videogen_pipeline(prompt,
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/yueyc/Latte/sample/pipeline_videogen.py", line 706, in call
noise_pred = self.transformer(
^^^^^^^^^^^^^^^^^
File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, *kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yueyc/Latte/models/latte_t2v.py", line 773, in forward
hidden_states = self.pos_embed(hidden_states) # alrady add positional embeddings
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/diffusers/models/embeddings.py", line 187, in forward
return (latent + pos_embed).to(latent.dtype)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.50 GiB (GPU 0; 47.54 GiB total capacity; 26.67 GiB already allocated; 60.81 MiB free; 26.96 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
0%| | 0/50 [00:02<?, ?it/s]
Traceback (most recent call last):
File "/home/yueyc/Latte/sample/sample_t2v.py", line 120, in
main(OmegaConf.load(args.config))
File "/home/yueyc/Latte/sample/sample_t2v.py", line 86, in main
videos = videogen_pipeline(prompt,
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/yueyc/Latte/sample/pipeline_videogen.py", line 706, in call
noise_pred = self.transformer(
^^^^^^^^^^^^^^^^^
File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, *kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yueyc/Latte/models/latte_t2v.py", line 773, in forward
hidden_states = self.pos_embed(hidden_states) # alrady add positional embeddings
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yueyc/anaconda3/envs/latte/lib/python3.11/site-packages/diffusers/models/embeddings.py", line 187, in forward
return (latent + pos_embed).to(latent.dtype)