Preparing dataset for training CogVideo1.5 I2V

THUDM / CogVideo

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Apache License 2.0

9.34k stars 878 forks source link

Preparing dataset for training CogVideo1.5 I2V #486

Open Closertodeath opened 1 week ago

Closertodeath commented 1 week ago

System Info / 系統信息

Linux, otherwise N/A

Information / 问题信息

[X] The official example scripts / 官方的示例脚本
[ ] My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

Here it only provides information on how to prepare the dataset for text to video. There is no information for I2V.

Expected behavior / 期待表现

Information on how to prepare a dataset for image to video.

zRzRzRzRzRzRzR commented 1 week ago

Similarly, I2V means that the first frame of the video is the Image. The code will automatically capture it.

Closertodeath commented 1 week ago

Similarly, I2V means that the first frame of the video is the Image. The code will automatically capture it.

Does any resolution work similarly to how i2v currently works or does it need to be a set resolution?

zRzRzRzRzRzRzR commented 1 week ago

Need fixed, for example, CogVideoX1.0 is 720 * 480 Regarding CogVideoX1.5, it supports 768-1360 (long edge) and 768 short edge. However, there is currently no manpower available to invest in writing the specific fine-tuning code, and it is expected to continue using CogVideoX-Factory as the fine-tuning framework for open-source models.

aikitoria commented 1 week ago

Regarding CogVideoX1.5, it supports 768-1360 (long edge) and 768 short edge

Is vertical video (i.e. 768x1360) meant to be supported? It always becomes blurry when I try.