JeremyCJM / DiffSHEG

[CVPR'24] DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation
https://jeremycjm.github.io/proj/DiffSHEG/
BSD 3-Clause "New" or "Revised" License
112 stars 9 forks source link

the mouth opens too wide and cannot close #12

Closed yangleituitui closed 2 months ago

yangleituitui commented 3 months ago

Hello, I'm working with the BEAT dataset to infer facial expressions and gestures from speech, and after visualizing with Blender, the mouth opens too wide and cannot close. What could be the reason for this issue? 微信图片_20240609163712

JeremyCJM commented 2 months ago

Hi Yanglei, what is the audio you utilized for inference? If the audio differs a lot with training audios, such issue can happen due to the model generalization ability.

yangleituitui commented 2 months ago

Thank you very much for your answer. I have already solved the problem; it was an error with the downloaded voice model. I would also like to ask, where is the facial frame rate originally at 60 FPS converted to 15 FPS

JeremyCJM commented 2 months ago

Hi Yanglei, you can downsample the facial sequences to 15 FPS. This is the original setting in the BEAT paper.