Closed jylovec closed 1 year ago
It would be better if you could elaborate more details about this question. Audio-driven or video-driven taking head? What do you mean by real-time?
In audio-driven talking face generation I don't see papers that aim at solving the 'real-time' problem. For video-driven talking face generation, i.e, face reenactment, there should be some related works.
I have checked that most of the audio-driven codes's output are videos.'real time' means that we can drive 2D or 3D person by microphone' input audio.Can you give some advise,thanks.
I'd say the 'real-time' task you mentioned is more challenging. Because popular taking face methods usually take an audio sequence as input, extract features, and drive the face. However, 'real-time' means you need to take the single-frame audio or a stream as input, which requires both algorithm and engineering efforts. So far I didn't come across this type of papers。
thanks for your explanation.
thanks a lot