custom audio generated video cannot be over 29s

KU-CVLAB / GaussianTalker

Official implementation of “GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting” by Kyusun Cho, Joungbin Lee, Heeji Yoon, Yeobin Hong, Jaehoon Ko, Sangjun Ahn and Seungryong Kim

Other

255 stars 31 forks source link

custom audio generated video cannot be over 29s #43

Open minervazz opened 3 weeks ago

minervazz commented 3 weeks ago

Hi,

First of all thank you guys for your wonderful model.

Here is a problem I encountered when trying to run your model. When I used a custom audio of 2min30s to render the trained obama model, the output .mov functions normally in the first 29s with good lip sync, but after 29s the video gets stuck on one image while the voice continues. If I use a shorter audio of length lesser than 29s, everything will work well.

Would it be possible if you can explain this to me? I haven't looked closely at the source code yet, if you have set a limit to the time length, please let me know.

Cheers,

Minerva

https://github.com/user-attachments/assets/92b4d612-959b-46e2-a5a8-d37acb449efa

https://github.com/user-attachments/assets/f1a005a2-02d7-43f4-b9e6-1bb506f01d2d

joungbinlee commented 3 weeks ago

Hello,

Thank you for using our model. We currently use the first 10/11th of the entire dataset as the training set and the remaining 1/11th as the test dataset. The length of the eye and camera features required for rendering is only 29 seconds, which corresponds to the entire test set. As a result, when a longer audio is provided, only 29 seconds can be rendered. If you require a longer rendering, you will need to supply additional eye and camera features(ex, train dataset or another test dataset), which will allow us to generate a longer video.

Thank you!

minervazz commented 3 weeks ago

Thanks!在 2024年8月23日，11:51，이정빈 @.***> 写道： Hello, Thank you for using our model. We currently use the first 10/11th of the entire dataset as the training set and the remaining 1/11th as the test dataset. The length of the eye and camera features required for rendering is only 29 seconds, which corresponds to the entire test set. As a result, when a longer audio is provided, only 29 seconds can be rendered. If you require a longer rendering, you will need to supply additional eye and camera features(ex, train dataset or another test dataset), which will allow us to generate a longer video. Thank you!

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

wwqy commented 1 week ago

어떻게 수정해야 하나요?테스트 영상을 그냥 바꿔요?아니면 테스트 영상의 길이를 수정하는 건가요?