from the code of sat SFTDataset, i can only see it supports video dataset (mp4 extension), which is different from the paper that says uses images as well. Is there any reason for this?
Motivation / 动机
Motivation is curiousity and custom training. Thanks for sharing great model.
We only made optimizations on the diffusers version afterwards, as SAT I2V requires too much video memory.
In principle, it is the same, both select the first frame of the training set's MP4 as the image dataset.
Feature request / 功能建议
from the code of sat SFTDataset, i can only see it supports video dataset (mp4 extension), which is different from the paper that says uses images as well. Is there any reason for this?
Motivation / 动机
Motivation is curiousity and custom training. Thanks for sharing great model.
Your contribution / 您的贡献
None yet.