Open wangxihao opened 1 year ago
Hi, it seems that you are using a single sketch instead of a single static image as the condition for video generation, which results in a different outcome from what is presented in the paper. Additionally, as we mentioned in this repo, due to the diversity generated by diffusion models, you may try to use different seeds to obtain better results.
Hi @wangxihao, were you able to do inference using single image+text input?
When I use text = "Smiling woman in cowboy hat with wheat ears" and single image and single image as below : and the execute
python run_net.py\ --cfg configs/exp04_sketch2video_wo_style.yaml\ --seed 144\ --sketch_path "demo_video/hat_woman.png"\ --input_text_desc "Smiling woman in cowboy hat with wheat ears"
Then i get gif as follow, and is not same as paper, How can I get the result as the paper? Thank you