THUDM / CogVideo

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Apache License 2.0
8.32k stars 785 forks source link

SFT doesn't support image joint training #438

Open jsg921019 opened 1 week ago

jsg921019 commented 1 week ago

Feature request / 功能建议

from the code of sat SFTDataset, i can only see it supports video dataset (mp4 extension), which is different from the paper that says uses images as well. Is there any reason for this?

Motivation / 动机

Motivation is curiousity and custom training. Thanks for sharing great model.

Your contribution / 您的贡献

None yet.

zRzRzRzRzRzRzR commented 3 days ago

We only made optimizations on the diffusers version afterwards, as SAT I2V requires too much video memory. In principle, it is the same, both select the first frame of the training set's MP4 as the image dataset.