JeremyCJM / DiffSHEG

[CVPR'24] DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation
https://jeremycjm.github.io/proj/DiffSHEG/
BSD 3-Clause "New" or "Revised" License
112 stars 9 forks source link

About the number of generated motions and the length of the audio #16

Closed Mumuwei closed 2 months ago

Mumuwei commented 2 months ago

Thanks for your work! Some confusion was encountered during testing.

I used the provided audio _Forresttts.wav and then ran _inference_custom_audioshow.sh, and I got 1778 frames of motion information. I rendered the generated motion, then I had to set the video fps=15 to match the audio length, which seemed to have a problem.

JeremyCJM commented 2 months ago

That is right. The FPS of BEAT is set to 15 in the processed dataset, following the setting of original BEAT paper.

Mumuwei commented 2 months ago

Thanks for your reply. If I want to test the results on SHOW (fps=30) and load the provided pre-trained model on the SHOW dataset, how do I modify the code?