Boese0601 / MagicDance

[ICML 2024] MagicPose(also known as MagicDance): Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion
https://boese0601.github.io/magicdance/
Other
629 stars 52 forks source link

Questions about Input Latent for Appearance Control Model #33

Open CrispyFeSo4 opened 1 month ago

CrispyFeSo4 commented 1 month ago

Could you please clarify whether the input to the appearance control model is the latent of the reference image or the noisy latent after DDIM inversion processing? thanks!!

Boese0601 commented 1 month ago

As mentioned in the paper and the code, it's the latent of reference image without adding noise.

CrispyFeSo4 commented 1 month ago

Thank you for your response. Since many previous training-free methods use the noisy latent after inversion, I had this question and wanted to confirm it.

By the way, have you tried using the latent after inversion? What differences in results have you observed compared to directly inputting the latent?

Boese0601 commented 1 month ago

I have tried before using the noisy latent as input to the appearance control model, with corresponding noise sampled from timestep t, but it looks from the result that it doesn't make much difference or even worse. I have implemented this part in the code as well, just simply set wonoise in the arguments to False.

CrispyFeSo4 commented 1 month ago

Thank you for your detailed response!! I suspect that it might be because the trainable models can handle both noisy and non-noisy latents, while training-free methods can only handle non-noisy latents. The trainable models can extract more information from the non-noisy latent as a reference. Good job!