Vchitect / Latte

Latte: Latent Diffusion Transformer for Video Generation.
Apache License 2.0
1.45k stars 147 forks source link

Preprocess of UCF101 #35

Open valencebond opened 4 months ago

valencebond commented 4 months ago

thanks for your great work! i want to know how to generate the /path/to/datasets/UCF101/train_256_list.txt for the UCF101 training。 After downloading the UCF101 videos, according to the paper "We extract 16-frame video clips from these datasets", are there any process scripts we can follow?

maxin-cn commented 4 months ago

thanks for your great work! i want to know how to generate the /path/to/datasets/UCF101/train_256_list.txt for the UCF101 training。 After downloading the UCF101 videos, according to the paper "We extract 16-frame video clips from these datasets", are there any process scripts we can follow?

Hi, thanks for your interest. train_256_list.txt contains the following information (video-class-name_video-name_frame): image

As for the second question, please follow this link.

valencebond commented 4 months ago

@maxin-cn how can i generate the train_256_list.txt, is there some process scripts? if I want to train with ucf101_img_train.yaml

maxin-cn commented 4 months ago

@maxin-cn how can i generate the train_256_list.txt, is there some process scripts? if I want to train with ucf101_img_train.yaml

Please refer to the following code (firstly transfer video to frames and resize at the same time. You can refer to this.):

import os
from tqdm import tqdm

ffs_image_root = '/UCF101/images_256/'
ffs_image_txt = '/UCF101/train_256_list.txt'

def get_filelist(file_path):
    Filelist = []
    for home, dirs, files in os.walk(file_path):
        for filename in files:
            Filelist.append(os.path.join(home, filename))
            # Filelist.append(filename)
    return Filelist

ffs_files = get_filelist(ffs_image_root)

for i in tqdm(ffs_files):
    relative_path = i.split(ffs_image_root)[-1]
    with open(ffs_image_txt, 'a+') as f:
        f.writelines(relative_path + '\n')
xszheng2020 commented 4 months ago

Hi, @maxin-cn

I processed the UCF101 dataset following your above suggestions and the data is organized as follows: 截屏2024-03-01 下午8 46 27

But when I try to start training using train_with_img.py, it gets frozen with a super high CPU occupation. Any idea?

[2024-03-01 12:24:03] Experiment directory created at ./results_img/000-LatteIMG-S-2-F16S3-ucf101_img
Starting rank=1, local rank=1, seed=3408, world_size=2.
[2024-03-01 12:24:05] Model Parameters: 32,624,288
[2024-03-01 12:24:07] Dataset contains 2,486,613 videos (./data/UCF-101)
maxin-cn commented 4 months ago

Hi, @maxin-cn

I processed the UCF101 dataset following your above suggestions and the data is organized as follows: 截屏2024-03-01 下午8 46 27

But when I try to start training using train_with_img.py, it gets frozen with a super high CPU occupation. Any idea?

[2024-03-01 12:24:03] Experiment directory created at ./results_img/000-LatteIMG-S-2-F16S3-ucf101_img
Starting rank=1, local rank=1, seed=3408, world_size=2.
[2024-03-01 12:24:05] Model Parameters: 32,624,288
[2024-03-01 12:24:07] Dataset contains 2,486,613 videos (./data/UCF-101)

@valencebond Hi, could you share any experience with @xszheng2020 ? Thank you very much~

xszheng2020 commented 4 months ago

Hi, @maxin-cn

The images should be placed as follows, right?

['./UCF101/images_256/WritingOnBoardv_WritingOnBoard_g02_c04/000148.jpg',
 './UCF101/images_256/WritingOnBoardv_WritingOnBoard_g02_c04/000105.jpg',
 './UCF101/images_256/WritingOnBoardv_WritingOnBoard_g02_c04/000155.jpg',
 './UCF101/images_256/WritingOnBoardv_WritingOnBoard_g02_c04/000037.jpg',
 './UCF101/images_256/WritingOnBoardv_WritingOnBoard_g02_c04/000027.jpg',
 './UCF101/images_256/WritingOnBoardv_WritingOnBoard_g02_c04/000126.jpg',
...]

And the train_256_list.txt contains:

WritingOnBoardv_WritingOnBoard_g02_c04/000148.jpg
WritingOnBoardv_WritingOnBoard_g02_c04/000105.jpg
WritingOnBoardv_WritingOnBoard_g02_c04/000155.jpg
WritingOnBoardv_WritingOnBoard_g02_c04/000037.jpg
WritingOnBoardv_WritingOnBoard_g02_c04/000027.jpg
WritingOnBoardv_WritingOnBoard_g02_c04/000126.jpg
...
maxin-cn commented 4 months ago

Hi, @maxin-cn

The images should be placed as follows, right?

['./UCF101/images_256/WritingOnBoardv_WritingOnBoard_g02_c04/000148.jpg',
 './UCF101/images_256/WritingOnBoardv_WritingOnBoard_g02_c04/000105.jpg',
 './UCF101/images_256/WritingOnBoardv_WritingOnBoard_g02_c04/000155.jpg',
 './UCF101/images_256/WritingOnBoardv_WritingOnBoard_g02_c04/000037.jpg',
 './UCF101/images_256/WritingOnBoardv_WritingOnBoard_g02_c04/000027.jpg',
 './UCF101/images_256/WritingOnBoardv_WritingOnBoard_g02_c04/000126.jpg',
...]

And the train_256_list.txt contains:

WritingOnBoardv_WritingOnBoard_g02_c04/000148.jpg
WritingOnBoardv_WritingOnBoard_g02_c04/000105.jpg
WritingOnBoardv_WritingOnBoard_g02_c04/000155.jpg
WritingOnBoardv_WritingOnBoard_g02_c04/000037.jpg
WritingOnBoardv_WritingOnBoard_g02_c04/000027.jpg
WritingOnBoardv_WritingOnBoard_g02_c04/000126.jpg
...

That's right.

ShunLu91 commented 2 months ago

Dear authors,

I really appreciate your exceptional open-source work. As a newcomer to the video field, I have been exploring your repository and have a couple of questions that I hope you can help clarify.

  1. Training Step vs. Epoch: the training procedure specifies max_train_steps rather than max_train_epochs. My understanding is that when using multiple GPUs, the steps per epoch decrease, but the total number of epochs increases, which doesn't seem to leverage the acceleration benefits of multi-GPU training. Could you please explain the rationale behind this setting? And why does the Fig.8 only use 150k training iterations at most but ffs_train.yaml and 'ucf101_train.yaml' adopt max_train_steps=1000k?

  2. UCF-101 Dataset Training Details: I would also like to inquire about the training resources, time, and performance of the model on the UCF-101 dataset. According to the code, training the Latte-S/2 model on UCF-101 with two 32G NVIDIA V100 GPUs takes approximately 2.5 minutes for every 100 steps. This would translate to an 18-day training period for 1 million steps, which seems quite extensive. Could you provide details on the configuration used for training? Additionally, information on the training time and Inception Score (IS) precision for the Latte-S/2 model on the UCF-101 dataset would be invaluable. You mentioned that 8 cards were used for all experiments in Issue #39 and I wonder which type of GPUs are used.

Thank you for your time and for maintaining such a fantastic resource for the community.

Best regards,

Shun Lu

maxin-cn commented 2 months ago

Dear authors,

I really appreciate your exceptional open-source work. As a newcomer to the video field, I have been exploring your repository and have a couple of questions that I hope you can help clarify.

  1. Training Step vs. Epoch: the training procedure specifies max_train_steps rather than max_train_epochs. My understanding is that when using multiple GPUs, the steps per epoch decrease, but the total number of epochs increases, which doesn't seem to leverage the acceleration benefits of multi-GPU training. Could you please explain the rationale behind this setting? And why does the Fig.8 only use 150k training iterations at most but ffs_train.yaml and 'ucf101_train.yaml' adopt max_train_steps=1000k?
  2. UCF-101 Dataset Training Details: I would also like to inquire about the training resources, time, and performance of the model on the UCF-101 dataset. According to the code, training the Latte-S/2 model on UCF-101 with two 32G NVIDIA V100 GPUs takes approximately 2.5 minutes for every 100 steps. This would translate to an 18-day training period for 1 million steps, which seems quite extensive. Could you provide details on the configuration used for training? Additionally, information on the training time and Inception Score (IS) precision for the Latte-S/2 model on the UCF-101 dataset would be invaluable. You mentioned that 8 cards were used for all experiments in Issue #39 and I wonder which type of GPUs are used.

Thank you for your time and for maintaining such a fantastic resource for the community.

Best regards,

Shun Lu

Hi, thanks for your interest.

  1. The max_train_steps parameter in ffs_train.yaml does not necessarily mean that the model needs to be trained up to 1000k steps; you can consider it as a maximum value.
  2. The configuration in this repository is almost the same as the configuration I used. I use 8 A100 GPUs (80G) to conduct all the experiments shown in the paper (except for LatteT2V).

If you have any questions, please let me know.

ShunLu91 commented 2 months ago

Dear authors, I really appreciate your exceptional open-source work. As a newcomer to the video field, I have been exploring your repository and have a couple of questions that I hope you can help clarify.

  1. Training Step vs. Epoch: the training procedure specifies max_train_steps rather than max_train_epochs. My understanding is that when using multiple GPUs, the steps per epoch decrease, but the total number of epochs increases, which doesn't seem to leverage the acceleration benefits of multi-GPU training. Could you please explain the rationale behind this setting? And why does the Fig.8 only use 150k training iterations at most but ffs_train.yaml and 'ucf101_train.yaml' adopt max_train_steps=1000k?
  2. UCF-101 Dataset Training Details: I would also like to inquire about the training resources, time, and performance of the model on the UCF-101 dataset. According to the code, training the Latte-S/2 model on UCF-101 with two 32G NVIDIA V100 GPUs takes approximately 2.5 minutes for every 100 steps. This would translate to an 18-day training period for 1 million steps, which seems quite extensive. Could you provide details on the configuration used for training? Additionally, information on the training time and Inception Score (IS) precision for the Latte-S/2 model on the UCF-101 dataset would be invaluable. You mentioned that 8 cards were used for all experiments in Issue #39 and I wonder which type of GPUs are used.

Thank you for your time and for maintaining such a fantastic resource for the community. Best regards, Shun Lu

Hi, thanks for your interest.

  1. The max_train_steps parameter in ffs_train.yaml does not necessarily mean that the model needs to be trained up to 1000k steps; you can consider it as a maximum value.
  2. The configuration in this repository is almost the same as the configuration I used. I use 8 A100 GPUs (80G) to conduct all the experiments shown in the paper (except for LatteT2V).

If you have any questions, please let me know.

Great thanks to your prompt and insightful response. I have two more questions when reproducing the results:

  1. Training Steps and Duration for Latte-XL/2 on UCF-101: Could you please specify the exact number of training steps and the training time when training the Latte-XL/2 using eight A100 GPUs on the UCF-101 dataset?

  2. Availability of Latte-S/2 Model on UCF-101: Have you trained the Latte-S/2 model on the UCF-101 dataset? If so, would it be possible for you to share such a model for reference? It would be useful for me to verify my experiments.

Thank you once again for your time and support.

Warm regards

maxin-cn commented 2 months ago

Dear authors, I really appreciate your exceptional open-source work. As a newcomer to the video field, I have been exploring your repository and have a couple of questions that I hope you can help clarify.

  1. Training Step vs. Epoch: the training procedure specifies max_train_steps rather than max_train_epochs. My understanding is that when using multiple GPUs, the steps per epoch decrease, but the total number of epochs increases, which doesn't seem to leverage the acceleration benefits of multi-GPU training. Could you please explain the rationale behind this setting? And why does the Fig.8 only use 150k training iterations at most but ffs_train.yaml and 'ucf101_train.yaml' adopt max_train_steps=1000k?
  2. UCF-101 Dataset Training Details: I would also like to inquire about the training resources, time, and performance of the model on the UCF-101 dataset. According to the code, training the Latte-S/2 model on UCF-101 with two 32G NVIDIA V100 GPUs takes approximately 2.5 minutes for every 100 steps. This would translate to an 18-day training period for 1 million steps, which seems quite extensive. Could you provide details on the configuration used for training? Additionally, information on the training time and Inception Score (IS) precision for the Latte-S/2 model on the UCF-101 dataset would be invaluable. You mentioned that 8 cards were used for all experiments in Issue #39 and I wonder which type of GPUs are used.

Thank you for your time and for maintaining such a fantastic resource for the community. Best regards, Shun Lu

Hi, thanks for your interest.

  1. The max_train_steps parameter in ffs_train.yaml does not necessarily mean that the model needs to be trained up to 1000k steps; you can consider it as a maximum value.
  2. The configuration in this repository is almost the same as the configuration I used. I use 8 A100 GPUs (80G) to conduct all the experiments shown in the paper (except for LatteT2V).

If you have any questions, please let me know.

Great thanks to your prompt and insightful response. I have two more questions when reproducing the results:

  1. Training Steps and Duration for Latte-XL/2 on UCF-101: Could you please specify the exact number of training steps and the training time when training the Latte-XL/2 using eight A100 GPUs on the UCF-101 dataset?
  2. Availability of Latte-S/2 Model on UCF-101: Have you trained the Latte-S/2 model on the UCF-101 dataset? If so, would it be possible for you to share such a model for reference? It would be useful for me to verify my experiments.

Thank you once again for your time and support.

Warm regards

  1. You can refer to https://github.com/Vchitect/Latte/issues/58. I remember training on an 8 A100 for 2 days or so and the model could generate video.
  2. I have only trained models of different sizes on the FFS dataset, and I can share these models with you if you need them.
ShunLu91 commented 2 months ago

r

Sincere appreciation for the valuable information.

  1. Could you kindly inform me about the exact/approximate training steps required to reach an accuracy of 68.53% on UCF101 for the model listed in Table 1?
  2. Additionally, I really need these models on the FFS dataset and the exact/approximate training steps. I kindly request them at your convenience to my email: lushun901@gmail.com.

Really thanks for your generous support.

maxin-cn commented 2 months ago

r

Sincere appreciation for the valuable information.

  1. Could you kindly inform me about the exact/approximate training steps required to reach an accuracy of 68.53% on UCF101 for the model listed in Table 1?
  2. Additionally, I really need these models on the FFS dataset and the exact/approximate training steps. I kindly request them at your convenience to my email: lushun901@gmail.com.

Really thanks for your generous support.

  1. It is difficult for me to tell you the exact number of training steps for the model that achieved this Inception Score because I forgot which model was used. But it must have taken a long time (about several weeks).
  2. I have uploaded these models to here (all models training about 250k iterations).
ShunLu91 commented 2 months ago

r

Sincere appreciation for the valuable information.

  1. Could you kindly inform me about the exact/approximate training steps required to reach an accuracy of 68.53% on UCF101 for the model listed in Table 1?
  2. Additionally, I really need these models on the FFS dataset and the exact/approximate training steps. I kindly request them at your convenience to my email: lushun901@gmail.com.

Really thanks for your generous support.

  1. It is difficult for me to tell you the exact number of training steps for the model that achieved this Inception Score because I forgot which model was used. But it must have taken a long time (about several weeks).
  2. I have uploaded these models to here (all models training about 250k iterations).

Got it and thanks a lot.