Closed jiaqiAA closed 1 year ago
Good luck.
Thank you for your reply, I just tried the male voice, the effect is not very good
Hi,
I want to train the model with my own data, and the number of bones is 23. Zeggs's is 75, and MDM njoints=1141
, I want to know how to calculate njoints
.
When I use my own data, MDM njoints=361
worked.
Thanks!
It depends on what the features of gesture your used, as https://github.com/YoungSeng/DiffuseStyleGesture/blob/d796b3910d5e6bae9918b0b564d94f6110ffff5b/main/process/process_zeggs_bvh.py#L214
If you use you own motion data, you can use your motion features dimensions.
Thanks for your reply.
I have a question that the number of frames per inference is stride_poses = n_poses - n_seed
, for example, if stride_poses=80
, and the total number of frames for the input data is 200. It will lose 40 frames of data. Could I fill in zeros at the end of the data to make it 240 frames long, and delete it after inference. Does this affect the previous results? or do you have any other way to solve this problem?
Thanks!
It's strange, there shouldn't be a problem with this, and surely the GENEA Challenge submissions have audio and gestures of the same length, see the code for DiffuseStyleGesture+.
As you said, assuming each segment is 100 (seed frames 40+60) long, for example 200 frames of speech, inference starts with either using 0 to make up 40 frames or picking 40 frames of gesture in the dataset as the initial gesture, which makes 240 frames, and each time inference generates 100 frames using the last 40 frames of the previous segment as the input, i.e., predicting the last 60 frames. Finally, delete the first 0 patch 40 frames or the 40 frames selected in the dataset will be as long as the audio. The last segment (e.g. less than 60 frames) can be handled in any way (discarded, filled with zeros, etc.) and has no effect on the result.
Hi,
When I input Chinese audio, the beat does not correspond to the body movements very well. Do you know why? Do I need to train the model with Chinese datasets?