How to perturb the temporal modules with only image datasets? (The implemenataion details in VideoCraft2)

AILab-CVC / VideoCrafter

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

https://ailab-cvc.github.io/videocrafter2/

Other

4.58k stars 342 forks source link

How to perturb the temporal modules with only image datasets? (The implemenataion details in VideoCraft2) #69

Open Sora-Lite opened 8 months ago

Sora-Lite commented 8 months ago

VideoCraft claims that "We perturb the temporal modules while fixing the spatial modules with the image dataset". With the image dataset, the image sequence is not available, how to get the inputs for temporal module finetuneing?