Aubrey-ao / HumanBehaviorAnimation

163 stars 14 forks source link

Questions about GestureDiffuCLIP #14

Open chenmingTHU opened 3 months ago

chenmingTHU commented 3 months ago

Thanks for your impressive works ! I have a few questions when I reading your paper GestureDiffuCLIP.

  1. The MotionCLIP model use SMPL parameters as the motion representations, while BEAT and ZeroEGGs datasets have different formats. How do you apply MotionCLIP to these motion capture datasets ?
  2. The VQ-VAE used in this paper follows the structure of JukeBox, what codebook size do you choose for BEAT and ZeroEGGs ?
Aubrey-ao commented 3 months ago

Hi,

Thx : )

  1. Yep, we re-train a new MotionCLIP mainly on HumanML3D. Some gesture data is also evolved, making the motion distribution close to the gesture. Ps: retargeting into the same skeleton is needed.
  2. 512 for both. For better quality of reconstruction, I recommend to try Residual VQ-VAE.

Hope it is helpful for u.

chenmingTHU commented 1 month ago

Thanks for your reply! I have a few more questions when I try to reproduce this paper.

  1. For VQVAE, do you jointly train it on gesture and HumanML3D datasets?
  2. For CLIP-based style control, the self-supervision is used during training on gesture dataset. Did you randomly select a gesture clip from dataset as style prompt or just use the gesture clip corresponding to the audio clip ?