Questions about Input Latent for Appearance Control Model

Boese0601 / MagicDance

[ICML 2024] MagicPose(also known as MagicDance): Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion

https://boese0601.github.io/magicdance/

Other

704 stars 63 forks source link

Questions about Input Latent for Appearance Control Model #33

Open CrispyFeSo4 opened 5 months ago

CrispyFeSo4 commented 5 months ago

Could you please clarify whether the input to the appearance control model is the latent of the reference image or the noisy latent after DDIM inversion processing? thanks!!

Boese0601 commented 5 months ago

As mentioned in the paper and the code, it's the latent of reference image without adding noise.

CrispyFeSo4 commented 5 months ago

Thank you for your response. Since many previous training-free methods use the noisy latent after inversion, I had this question and wanted to confirm it.

By the way, have you tried using the latent after inversion? What differences in results have you observed compared to directly inputting the latent?

Boese0601 commented 5 months ago

I have tried before using the noisy latent as input to the appearance control model, with corresponding noise sampled from timestep t, but it looks from the result that it doesn't make much difference or even worse. I have implemented this part in the code as well, just simply set wonoise in the arguments to False.

CrispyFeSo4 commented 5 months ago

Thank you for your detailed response!! I suspect that it might be because the trainable models can handle both noisy and non-noisy latents, while training-free methods can only handle non-noisy latents. The trainable models can extract more information from the non-noisy latent as a reference. Good job!

bbing32475 commented 2 months ago

你好，可以问一下代码中的外观控制模型是哪一个文件的类吗