ali-vilab / videocomposer

Official repo for VideoComposer: Compositional Video Synthesis with Motion Controllability
https://videocomposer.github.io
MIT License
887 stars 80 forks source link

How was the pretraining dataset Laion-400M used ? does this actually refer to the use of the ‘open_clip_pytorch_model’ from OPENCLIP ? #25

Open BaiqiangGit opened 1 year ago

BaiqiangGit commented 1 year ago

您好,请教下,论文里提到的用Laion-400M预训练,是指用Laion-400M对VideoComposer做了额外的预训练 ?如果是的话,预训练的输入组织方式,和参与训练的算法模块,可以讲解一下吗? 谢谢 ~

PS: 看代码里和Laion相关的有2个预训练模型,没有找到Laion-400M相关的,是不是我理解错了?

Steven-SWZhang commented 1 year ago

Hello, our model supports both videos and single frame as inputs. When inputting single frame, the value of F is set to 1. As long as the dimensions within each batch are consistent, we train the model using both images and videos simultaneously.