TMElyralab / MuseTalk

MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting
Other
1.93k stars 236 forks source link

请问如何可以实时? #6

Closed wingjoezhou closed 2 months ago

wingjoezhou commented 3 months ago

4090 显卡,现在 15秒语音,用语音驱动,转换一次,耗时大概2分钟。

itechmusic commented 3 months ago

Before proceeding, please ensure you have reviewed the notes section in our repository: https://github.com/TMElyralab/MuseTalk?tab=readme-ov-file#note. It contains important information that will be beneficial for your understanding.

For online chatting, MuseTalk operates in real-time by utilizing only the UNet and the VAE decoder. These components require 32ms/frame when run on an NVIDIA Tesla V100.

The VAE encoder latent is pre-saved, which means the computation time prior to line #L90 can be disregarded. You can refer to this here: https://github.com/TMElyralab/MuseTalk/blob/main/scripts/inference.py#L90

The mask_image is solely dependent on the original image, allowing it to be obtained in advance to reduce computation time. More details can be found here: https://github.com/TMElyralab/MuseTalk/blob/main/musetalk/utils/blending.py#L54 Please feel free to reach out if you have any further questions or concerns.

xunnew commented 3 months ago

有计划推出实时推理的示例吗

itechmusic commented 2 months ago

有计划推出实时推理的示例吗

实时推理代码示例已更新 https://github.com/TMElyralab/MuseTalk?tab=readme-ov-file#new-real-time-inference