Open newbinxxx opened 1 year ago
是的,是训练集中不同人的head pose的分布。
Pose style 加入这个参数好像没什么变化
请问有否对照的pose的图,好让我知道如何选择合适的对应数字
请问有否对照的pose的图,好让我知道如何选择合适的对应数字
很需要这个,哪里可以看到对应的数据呀?
Yes can somebody please explain in English what the heck is the pose style, I asked that before already but no response.
Exactly. Pose style is really confusing for just numbers presented. Really need someone explain the differences among all these poses.
You can reference how this is used in src/test_audio2coeff.py
Don't let the test_
prefix fool you - that's where the Audio2Coeff class is defined.
It is further implemented in src/audio2pose_models/audio2pose.py
It serves two purposes.
Latent Space Conditioning
_The posestyle serves as a conditioning input for the Conditional Variational Autoencoder (CVAE). This helps the model generate different styles of poses based on the given style index.
Pose Motion Prediction
_The posestyle influences the generated pose motions by determining which latent space components are activated. This results in different pose dynamics and styles in the final output.
In my practical experience, these are not really specific poses - the same pose style applied to different speech inputs won't necessarily resemble one another in specific ways, but they will exhibit similar overall movement characteristics. I don't believe there is ANY reason why anyone would select "pose style 5" over "pose style 37" for example. It's not that one pose style is "cute", another "stern", another "frightened" or whatever - they're just different. (There may be subjective emotional assessments you could make of them but that's not what it means.)
But if you're generating lots and lots of video from audio sources, and you don't vary the pose style, you may end up with monotonous videos with the same "poses" or movements. Randomly picking a pose style or incrementing the pose style number can reduce that. Otherwise, you could always render a dozen different pose styles from the same audio source and pick one that you think looks natural and engaging.
If you actually want something more specific, you'd be better served to use a reference video. Television anchors, news personalities, talk show hosts, etc., can be a good resource for talking head videos in a variety of more identifiable styles: you could easily pull clips of people excitedly talking about something, reporting on a somber news story, delivering a critical tirade, etc., and use those for reference to get something that'll feel more "tunable". Pose style will generally just give you variation on top of whatever would be generated anyway.
Pose style 1-46 是不同的头部姿势?今天git pull后全部都正常了~~