Real time use ? - Githubissues

Hi We didn't test the model in a real-time setting. In principle, the model obtains the global context of the audio first (about 4s in our experiments) and then autoregressive synthesizes 3d facial animation. That means the performance may be dropped if you only provide a small window of audio in a real-time setting. This could be a limitation and need to be further explored. Previous works, e.g. VOCA and MeshTalk may be suitable for realtime applications as they adopted small audio windows in their methods.

Doubiiu / CodeTalker

Real time use ? #37