Can we use this for realtime generation?

X-LANCE / AniTalker

[ACM MM 2024] This is the official code for "AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding"

https://x-lance.github.io/AniTalker/

Apache License 2.0

1.35k stars 125 forks source link

Can we use this for realtime generation? #11

Closed muhammadumair894 closed 1 month ago

muhammadumair894 commented 3 months ago

Training and Inference Hardware. Specifically, the GPU with 8G VRAM can generate up to 3 minutes of video in one inference.

Hi there,

First of all, splendid work! I really appreciate the generalizability of this model and am looking forward to the code release.

I have a question regarding the paper's mention that a single 8GB GPU can produce a 3-minute video. Is it possible to make this process real-time or to stream the generated video so it appears as a real-time conversation?

muhammadumair894 commented 3 months ago

How much time one inference takes for 3 min generation?

liutaocode commented 1 month ago

Currently, motion generation is a long-term process (taking several seconds to several minutes). It cannot achieve real-time performance like vasa1. If the segments are too short, there will be a lack of smoothness between segments.