Vchitect / Latte

Latte: Latent Diffusion Transformer for Video Generation.
Apache License 2.0
1.45k stars 147 forks source link

Some errors when running the LatteT2v #14

Open zgdjcls opened 4 months ago

zgdjcls commented 4 months ago

Hi, I tried to sample from the pre-trained LatteT2V model by running on CPU. But I have several errors during running the code.

Steps to reproduce the error

  1. modifying enviroment.yml to follow the requirement
  2. download the t2v.pt and whole folder from https://huggingface.co/maxin-cn/Latte/tree/main/t2v_required_models and keep the same structure, name this folder as t2v, so we have t2v/scheduler .... t2v/model_index.json and t2v/t2v.pt
  3. In t2v.sample.yaml, let ckpt = "t2v/t2v.pt" and pretrained_model_path = "t2v"
  4. change the name of file transformer_config.json in t2v/transformer to config.json. Because I got RuntimeError: t2v\transformer\config.json does not exist in line 982 in latte_t2v.py, in from_pretrained_2d.
  5. I got RuntimeError: None does not exist in line 1000 in latte_t2v.py, in from_pretrained_2d. Since we store our diffusion_pytorch_model.safetensors in t2v/vae and there is no .bin file in t2v, there are no .safetensors and .bin files in t2v/transformer folder.

Should I move .safetensors file to t2v/transformer? Could you please review this part?

maxin-cn commented 4 months ago

Hi, I tried to sample from the pre-trained LatteT2V model by running on CPU. But I have several errors during running the code.

Steps to reproduce the error

  1. modifying enviroment.yml to follow the requirement
  2. download the t2v.pt and whole folder from https://huggingface.co/maxin-cn/Latte/tree/main/t2v_required_models and keep the same structure, name this folder as t2v, so we have t2v/scheduler .... t2v/model_index.json and t2v/t2v.pt
  3. In t2v.sample.yaml, let ckpt = "t2v/t2v.pt" and pretrained_model_path = "t2v"
  4. change the name of file transformer_config.json in t2v/transformer to config.json. Because I got RuntimeError: t2v\transformer\config.json does not exist in line 982 in latte_t2v.py, in from_pretrained_2d.
  5. I got RuntimeError: None does not exist in line 1000 in latte_t2v.py, in from_pretrained_2d. Since we store our diffusion_pytorch_model.safetensors in t2v/vae and there is no .bin file in t2v, there are no .safetensors and .bin files in t2v/transformer folder.

Should I move .safetensors file to t2v/transformer? Could you please review this part?

Thank you for your issue. I did not provide pixart-alpha model weights within the https://huggingface.co/maxin-cn/Latte/tree/main/t2v_required_models/transformer, so you can follow this link to modify your code.

zgdjcls commented 4 months ago

Thank you for your help, I will update the result tomorrow

zgdjcls commented 4 months ago

Hi, after commenting line 988 to line 1015(which are from _modelfiles= to _model.load_statedict) and renaming some xxx_config.json files to config.json in these subfolders, I encountered a running time error File "E:\Latte\sample\sample_t2v.py", line 40, in main text_encoder = T5EncoderModel.from_pretrained(args.pretrained_model_path, subfolder="text_encoder", torch_dtype=torch.float16).to(device) OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory t2v. Where should I find those missing files? Also, if I want to run this model by using float32, besides setting use_fp16 = False in t2v_sample.yaml, should I also manually set the torch_dtype in vae, text_encoder and transformer_model be float32 in sample_t2v.py?

maxin-cn commented 4 months ago

Hi, after commenting line 988 to line 1015(which are from _modelfiles= to _model.load_statedict) and renaming some xxx_config.json files to config.json in these subfolders, I encountered a running time error File "E:\Latte\sample\sample_t2v.py", line 40, in main text_encoder = T5EncoderModel.from_pretrained(args.pretrained_model_path, subfolder="text_encoder", torch_dtype=torch.float16).to(device) OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory t2v. Where should I find those missing files? Also, if I want to run this model by using float32, besides setting use_fp16 = False in t2v_sample.yaml, should I also manually set the torch_dtype in vae, text_encoder and transformer_model be float32 in sample_t2v.py?

Please make sure that the args.pretrained_model_path contains the text_encoder folder, which contains the checkpoints. use_fp16 in yaml is a deprecating option for text2video, and if you want to use ft32 for inference, change all torch.float16 in sample_t2v.py to torch.float32.

zgdjcls commented 4 months ago

Hi, after commenting line 988 to line 1015(which are from _modelfiles= to _model.load_statedict) and renaming some xxx_config.json files to config.json in these subfolders, I encountered a running time error File "E:\Latte\sample\sample_t2v.py", line 40, in main text_encoder = T5EncoderModel.from_pretrained(args.pretrained_model_path, subfolder="text_encoder", torch_dtype=torch.float16).to(device) OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory t2v. Where should I find those missing files? Also, if I want to run this model by using float32, besides setting use_fp16 = False in t2v_sample.yaml, should I also manually set the torch_dtype in vae, text_encoder and transformer_model be float32 in sample_t2v.py?

Please make sure that the args.pretrained_model_path contains the text_encoder folder, which contains the checkpoints. use_fp16 in yaml is a deprecating option for text2video, and if you want to use ft32 for inference, change all torch.float16 in sample_t2v.py to torch.float32.

Maybe my question is too simple or dumb for you, because I'm new to this area. According to your Huggingface repo, I believe model-00001-of-00004.safetensors to model-00004-of-00004.safetensors are the checkpoints you mentioned above, but they don't follow the rules of naming given in the error information. I tried to rename them but it didn't work. So how should I rename the files or am I in the wrong way? Also, I found that you mentioned t2v training is not supported right now in train.py, does it mean the text prompt training part is not available?

maxin-cn commented 4 months ago

Hi, after commenting line 988 to line 1015(which are from _modelfiles= to _model.load_statedict) and renaming some xxx_config.json files to config.json in these subfolders, I encountered a running time error File "E:\Latte\sample\sample_t2v.py", line 40, in main text_encoder = T5EncoderModel.from_pretrained(args.pretrained_model_path, subfolder="text_encoder", torch_dtype=torch.float16).to(device) OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory t2v. Where should I find those missing files? Also, if I want to run this model by using float32, besides setting use_fp16 = False in t2v_sample.yaml, should I also manually set the torch_dtype in vae, text_encoder and transformer_model be float32 in sample_t2v.py?

Please make sure that the args.pretrained_model_path contains the text_encoder folder, which contains the checkpoints. use_fp16 in yaml is a deprecating option for text2video, and if you want to use ft32 for inference, change all torch.float16 in sample_t2v.py to torch.float32.

Maybe my question is too simple or dumb for you, because I'm new to this area. According to your Huggingface repo, I believe model-00001-of-00004.safetensors to model-00004-of-00004.safetensors are the checkpoints you mentioned above, but they don't follow the rules of naming given in the error information. I tried to rename them but it didn't work. So how should I rename the files or am I in the wrong way? Also, I found that you mentioned t2v training is not supported right now in train.py, does it mean the text prompt training part is not available?

Please ensure the file structure of pretrained_model_path is as follows:

├── pretrained_model_path
│   ├── scheduler
│   ├── text_encoder
│   ├── tokenizer
│   ├── transformer
│   ├── vae

You can also download the corresponding checkpoints from PixArt-alpha, and the model weights in t2v_required_models are also from PixArt-alpha.

You do not need to rename the name of the text encoder. If the text encoder loads successfully, the terminal will output the following: image

trian.py currently only supports training on four datasets: FaceForensics, SkyTimelapse, Taichi-HD, and UCF101.

zgdjcls commented 4 months ago

Thank you, I found there are some missing files in your hugging face repo, after replacing your videogen_pipeline by the PixArt-alpha one from diffusers import PixArtAlphaPipeline import torch videogen_pipeline= PixArtAlphaPipeline.from_pretrained("PixArt-alpha/PixArt-XL-2-512x512", torch_dtype=torch.float16) Now I can load the checkpoints and pipelines, but my memory looks like is under the requirements ofPixArt, I will try it later.

Thank you for not being annoyed by my dumb questions and for your meticulous help

maxin-cn commented 4 months ago

Thank you, I found there are some missing files in your hugging face repo, after replacing your videogen_pipeline by the PixArt-alpha one from diffusers import PixArtAlphaPipeline import torch videogen_pipeline= PixArtAlphaPipeline.from_pretrained("PixArt-alpha/PixArt-XL-2-512x512", torch_dtype=torch.float16) Now I can load the checkpoints and pipelines, but my memory looks like is under the requirements ofPixArt, I will try it later.

Thank you for not being annoyed by my dumb questions and for your meticulous help

Hi, thank you for your feedback. Please see this issue.

Xls1994 commented 4 months ago

Thank you, I found there are some missing files in your hugging face repo, after replacing your videogen_pipeline by the PixArt-alpha one from diffusers import PixArtAlphaPipeline import torch videogen_pipeline= PixArtAlphaPipeline.from_pretrained("PixArt-alpha/PixArt-XL-2-512x512", torch_dtype=torch.float16) Now I can load the checkpoints and pipelines, but my memory looks like is under the requirements ofPixArt, I will try it later.

Thank you for not being annoyed by my dumb questions and for your meticulous help

Hi, I also get oom error with T4. Did you find a good solution? Thanks.

maxin-cn commented 4 months ago

Thank you, I found there are some missing files in your hugging face repo, after replacing your videogen_pipeline by the PixArt-alpha one from diffusers import PixArtAlphaPipeline import torch videogen_pipeline= PixArtAlphaPipeline.from_pretrained("PixArt-alpha/PixArt-XL-2-512x512", torch_dtype=torch.float16) Now I can load the checkpoints and pipelines, but my memory looks like is under the requirements ofPixArt, I will try it later. Thank you for not being annoyed by my dumb questions and for your meticulous help

Hi, I also get oom error with T4. Did you find a good solution? Thanks.

Could you please provide more details? Thanks~

Xls1994 commented 4 months ago

Thank you, I found there are some missing files in your hugging face repo, after replacing your videogen_pipeline by the PixArt-alpha one from diffusers import PixArtAlphaPipeline import torch videogen_pipeline= PixArtAlphaPipeline.from_pretrained("PixArt-alpha/PixArt-XL-2-512x512", torch_dtype=torch.float16) Now I can load the checkpoints and pipelines, but my memory looks like is under the requirements ofPixArt, I will try it later. Thank you for not being annoyed by my dumb questions and for your meticulous help

Hi, I also get oom error with T4. Did you find a good solution? Thanks.

Could you please provide more details? Thanks~

Thanks,The OOM error with nvidia T4,and the logs are as follows. Should I modify some config with model or use a big memory GPU such as A100?

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████| 4/4 [00:02<00:00, 1.73it/s] Traceback (most recent call last): File "/app/alpaca-lora/voice/clip_proj/Latte/sample/sample_t2v.py", line 160, in <module> main(OmegaConf.load(args.config)) File "/app/alpaca-lora/voice/clip_proj/Latte/sample/sample_t2v.py", line 36, in main text_encoder = T5EncoderModel.from_pretrained(args.pretrained_model_path, subfolder="text_encoder", torch_dtype=torch.float16).to(device) File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2595, in to return super().to(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1160, in to return self._apply(convert) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply module._apply(fn) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply module._apply(fn) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply module._apply(fn) [Previous line repeated 4 more times] File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 833, in _apply param_applied = fn(param) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1158, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 80.00 MiB. GPU 0 has a total capacty of 14.58 GiB of which 19.31 MiB is free. Process 14798 has 2.72 GiB memory in use. Process 37381 has 11.83 GiB memory in use. Of the allocated memory 10.90 GiB is allocated by PyTorch, and 293.86 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

maxin-cn commented 4 months ago

Thank you, I found there are some missing files in your hugging face repo, after replacing your videogen_pipeline by the PixArt-alpha one from diffusers import PixArtAlphaPipeline import torch videogen_pipeline= PixArtAlphaPipeline.from_pretrained("PixArt-alpha/PixArt-XL-2-512x512", torch_dtype=torch.float16) Now I can load the checkpoints and pipelines, but my memory looks like is under the requirements ofPixArt, I will try it later. Thank you for not being annoyed by my dumb questions and for your meticulous help

Hi, I also get oom error with T4. Did you find a good solution? Thanks.

Could you please provide more details? Thanks~

Thanks,The OOM error with nvidia T4,and the logs are as follows. Should I modify some config with model or use a big memory GPU such as A100?

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████| 4/4 [00:02<00:00, 1.73it/s] Traceback (most recent call last): File "/app/alpaca-lora/voice/clip_proj/Latte/sample/sample_t2v.py", line 160, in <module> main(OmegaConf.load(args.config)) File "/app/alpaca-lora/voice/clip_proj/Latte/sample/sample_t2v.py", line 36, in main text_encoder = T5EncoderModel.from_pretrained(args.pretrained_model_path, subfolder="text_encoder", torch_dtype=torch.float16).to(device) File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2595, in to return super().to(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1160, in to return self._apply(convert) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply module._apply(fn) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply module._apply(fn) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply module._apply(fn) [Previous line repeated 4 more times] File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 833, in _apply param_applied = fn(param) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1158, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 80.00 MiB. GPU 0 has a total capacty of 14.58 GiB of which 19.31 MiB is free. Process 14798 has 2.72 GiB memory in use. Process 37381 has 11.83 GiB memory in use. Of the allocated memory 10.90 GiB is allocated by PyTorch, and 293.86 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Hi, I just tested the GPU memory requirements for the t2v inference. Inferencing on the A100 requires 20916MiB of GPU memory under fp16 precision mode.

nemonameless commented 4 months ago

May I ask how long it takes to run t2v inference to generate a video on 80G A100? @maxin-cn thanks

maxin-cn commented 4 months ago

May I ask how long it takes to run t2v inference to generate a video on 80G A100? @maxin-cn thanks

About 30s to generate one video on 80G A100.

Xls1994 commented 4 months ago

When I use the A100 to generate one video, the quality of the generated video is not as good as the one shown in paper.

maxin-cn commented 4 months ago

When I use the A100 to generate one video, the quality of the generated video is not as good as the one shown in paper.

The quality of the generated video may be related to the seed that is initialized. The publicly available t2v model is our very early model and we are working on improving its stability and releasing a stable t2v model as soon as possible. Please stay tuned~