BadToBest / EchoMimic

Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning
https://badtobest.github.io/echomimic.html
Apache License 2.0
2.26k stars 263 forks source link

why can inference without landmark #55

Closed renrenzsbbb closed 1 month ago

renrenzsbbb commented 1 month ago

Thanks for you great work. I found that the training is with random landmark input, but in inference, it can only input audio. can you introduce how to acchieve without degard result.

JoeFannie commented 1 month ago

Thank you for the interest. During training, the pose is randomly dropped, which leads to some audio-only cases. It is the reason why it works with only audio during inference.