aigc-apps / CogVideoX-Fun

📹 A more flexible CogVideoX that can generate videos at any resolution and creates videos from images.
Apache License 2.0
433 stars 28 forks source link

Training #12

Open fuchao01 opened 1 month ago

fuchao01 commented 1 month ago

Excellent work, could you please share some details about the training and how much training data was used?

bubbliiiing commented 1 month ago

We collected approximately 1.2 million high-quality data for the training of CogVideoX-Fun. During the training, we resized the videos based on different token lengths. The entire training process is divided into three phases, with each phase corresponding to 13312 (for 512x512x49 videos), 29952 (for 768x768x49 videos), and 53248 (for 1024x1024x49 videos).

Taking CogVideoX-Fun-2B as an example: In the 13312 phase, the batch size is 128 with 7k training steps. In the 29952 phase, the batch size is 256 with 6.5k training steps. In the 53248 phase, the batch size is 128 with 5k training steps.