ali-vilab / VGen

Official repo for VGen: a holistic video generation ecosystem for video generation building on diffusion models
https://i2vgen-xl.github.io
2.75k stars 243 forks source link

Question about the motion adapter in DreamVideo #116

Open Hugo-cell111 opened 1 month ago

Hugo-cell111 commented 1 month ago

Hi! I find that each time one frame of the guided video is selected to train the motion adapter. But since selecting only one image will break the coherence of a video, I wonder how the motion adapter can capture the temporal motion pattern? Thanks!

weilllllls commented 1 month ago

Hi, thanks for your interest. We train the motion adapter using all frames of input videos. Meanwhile, we select a random frame serving as the appearance guidance.

Hugo-cell111 commented 1 month ago

Thanks for your response! I also have another few questions: (1) how long does it take for each stage in DreamVideo? I have tried in my own server and found that it takes about 2 hours for just the 1st stage in subject learning. Is it normal? I use 4 V100 PCIE GPUs; (2) Could you provide the link of open_clip_pytorch_model.bin of FrozenOpenCLIPCustomEmbedder?

weilllllls commented 1 month ago

Thanks for your response! I also have another few questions: (1) how long does it take for each stage in DreamVideo? I have tried in my own server and found that it takes about 2 hours for just the 1st stage in subject learning. Is it normal? I use 4 V100 PCIE GPUs; (2) Could you provide the link of open_clip_pytorch_model.bin of FrozenOpenCLIPCustomEmbedder?

Hi. (1) We use one A100 80G GPU. It takes about 50 min for step 1 in subject learning and 10~15 min for step 2. I think your situation is normal due to device differences. By the way, you can reduce the number of training iterations to balance the performance and time costs. (2) The 'open_clip_pytorch_model.bin' used in DreamVideo is the same as the other models (I2VGen-XL, HiGen, TF-T2V, etc.) in this repository. You can download the ckpt from this link: https://modelscope.cn/api/v1/models/iic/tf-t2v/repo?Revision=master&FilePath=open_clip_pytorch_model.bin.

Hugo-cell111 commented 1 month ago

Thank you very much! By the way, how long does it take to evaluate on all datasets mentioned in the paper of DreamVideo? Could you provide the evaluation code?