TimoBolkart / voca

This codebase demonstrates how to synthesize realistic 3D character animations given an arbitrary speech signal and a static character mesh.
https://voca.is.tue.mpg.de/en
1.15k stars 273 forks source link

How can we make VOCA more real-time? #54

Closed 12345data closed 3 years ago

12345data commented 4 years ago

The time cost for a response based on the input audio is high. Can you provide some suggestions for making it more realtime? Thanks in Advance!

TimoBolkart commented 4 years ago

There is obviously some overhead when running this for each sequence independently so doing the whole TF initialization only once and then running it for multiple sequences in a row should speed things up. Other than that I guess the main bottleneck is running DeepSpeech. I don't know if there is actually a DeepSpeech variant that is realtime capable, if so, replacing this and retraining the model could be an option.

12345data commented 4 years ago

I have tried with single TF initialiaztion. But, one of the main function that takes time is 'render_mesh_helper' in the file rendering.py. Can the things done in that function can be speed up using any other methods. The time taken to complete 'render_sequence_meshes' function is really high. Do you have any suggestions to reduce time of it?

TimoBolkart commented 4 years ago

You are right, this is not even part of the method but just the rendering of the meshes to a video sequence. I don't have any publicly available fast rendering method in mind right now