Question about the img and txt signal guidance

Doubiiu / DynamiCrafter

[ECCV 2024] DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors

Apache License 2.0

2.09k stars 165 forks source link

Question about the img and txt signal guidance #25

Open HyoKong opened 4 months ago

HyoKong commented 4 months ago

Thank you for the great work!

In the 4.1 Implementation Details part of your paper, you claim that there are two guidance scales for text-conditioned image animation. I notice that in your released run.sh code, you commit the --multiple_cond_cfg. Is there any difference with or without --multiple_cond_cfg and will the performance be better without --multiple_cond_cfg?

Thank you so much for the help!

Doubiiu commented 4 months ago

Hi, Thanks for your question! Like what we have mentioned in the Supplementary Document of the paper, we can modify the cfg of two conditions to allow more flexibility/controllability for the model. Using --multiple_cond_cfg and --cfg_img can modify the cfg of image condition. We will use cfg=7.5 for both conditions (text and image) without --multiple_cond_cfg.