Closed muhammadumair894 closed 1 month ago
How much time one inference takes for 3 min generation?
Currently, motion generation is a long-term process (taking several seconds to several minutes). It cannot achieve real-time performance like vasa1. If the segments are too short, there will be a lack of smoothness between segments.
Training and Inference Hardware. Specifically, the GPU with 8G VRAM can generate up to 3 minutes of video in one inference.
Hi there,
First of all, splendid work! I really appreciate the generalizability of this model and am looking forward to the code release.
I have a question regarding the paper's mention that a single 8GB GPU can produce a 3-minute video. Is it possible to make this process real-time or to stream the generated video so it appears as a real-time conversation?