Official implementation of “GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting” by Kyusun Cho, Joungbin Lee, Heeji Yoon, Yeobin Hong, Jaehoon Ko, Sangjun Ahn and Seungryong Kim
I have made it, realtime.
But deepspeech cost too much GPU, about 24G.
So we have to use two GPUs for it.
I think the audio features extract process should not be so complicated.
Any idea to make it simple and less GPU?
I have made it, realtime. But deepspeech cost too much GPU, about 24G. So we have to use two GPUs for it. I think the audio features extract process should not be so complicated. Any idea to make it simple and less GPU?