ali-vilab / videocomposer

Official repo for VideoComposer: Compositional Video Synthesis with Motion Controllability
https://videocomposer.github.io
MIT License
887 stars 80 forks source link

Text + SingleImage not right #10

Open wangxihao opened 1 year ago

wangxihao commented 1 year ago

When I use text = "Smiling woman in cowboy hat with wheat ears" and single image and single image as below : hat_woman and the execute python run_net.py\ --cfg configs/exp04_sketch2video_wo_style.yaml\ --seed 144\ --sketch_path "demo_video/hat_woman.png"\ --input_text_desc "Smiling woman in cowboy hat with wheat ears"

Then i get gif as follow, and is not same as paper, How can I get the result as the paper? Thank you S144

Steven-SWZhang commented 1 year ago

Hi, it seems that you are using a single sketch instead of a single static image as the condition for video generation, which results in a different outcome from what is presented in the paper. Additionally, as we mentioned in this repo, due to the diversity generated by diffusion models, you may try to use different seeds to obtain better results.

prateksha commented 4 months ago

Hi @wangxihao, were you able to do inference using single image+text input?