可以用于音视频融合时的唇声同步任务吗？

KwaiVGI / LivePortrait

Bring portraits to life!

https://liveportrait.github.io

Other

12.71k stars 1.35k forks source link

可以用于音视频融合时的唇声同步任务吗？ #95

Closed KDD2018 closed 3 months ago

holasyb commented 3 months ago

cy 想知道怎么用

zzzweakman commented 3 months ago

We have discovered that the implicit-keypoint-based framework can be extended to audio-driven control. The experimental results are detailed in Appendix C of our paper. Due to internal limitations, we are sorry that we are unable to provide this model now. However, you can follow the instructions in Appendix C to train an audio-driven model on your own :) @KDD2018 @holasyb

markson14 commented 3 months ago

We have discovered that the implicit-keypoint-based framework can be extended to audio-driven control. The experimental results are detailed in Appendix C of our paper. Due to internal limitations, we are sorry that we are unable to provide this model now. However, you can follow the instructions in Appendix C to train an audio-driven model on your own :) @KDD2018 @holasyb

@zzzweakman Hi, great work! I really appreciate it. I have one question about audio-driven according to your Appendix C.

When I use audio-driven method, I am not able to use retargeting module since it capture the keypoints from videos. Am I correct?

zzzweakman commented 3 months ago

When I use audio-driven method, I am not able to use retargeting module since it capture the keypoints from videos. Am I correct?

The eyes retargeting module is designed to address the issue of incomplete eye closure during cross-id reenactment, especially when a person with small eyes drives a person with larger eyes. The lip retargeting module is designed similarly to the eye retargeting module, and can also normalize the input by ensuring that the lips are in a closed state, which facilitates better animation driving.

However, if the driving information comes from audio, you can still use our lip retargeting module to normalize the first few frames. This can help achieve improved result :)

TengliEd commented 2 months ago

@zzzweakman I am quite curious about what level the audio-driven model can achieve. Could you please provide us with some results you've produced? Since then, we can check if we implement correctly following your method.