IDEA-Research / HumanTOMATO

[ICML 2024] 🍅HumanTOMATO: Text-aligned Whole-body Motion Generation
https://lhchen.top/HumanTOMATO
Other
240 stars 6 forks source link

Questions about the Pretrained Checkpoints Used in the Evaluation of OpenTMA #18

Closed yxbian23 closed 3 weeks ago

yxbian23 commented 4 weeks ago

Hi, I have some questions about the Pretrained Evaluation Checkpoints of OpenTMA. I would be very grateful if you can provide some help~

LinghaoChan commented 3 weeks ago

Hi, I have some questions about the Pretrained Evaluation Checkpoints of OpenTMA. I would be very grateful if you can provide some help~

  • OpenTMA itself should be the motion encoder and text encoder optimized by contrastive learning. Why are these pre-trained weights read in for OpenTMA evaluation? What role do they play in OpenTMA evaluation?
  • Additionally, I would like to confirm that the "smpl212" in OpenTMA is based on the 322-dimensional SMPLX of MotionX with the 100-dimensional face_shape and 10-dimensional betas removed?
  • Last, the currently provided Pretrained Evaluation Checkpoints of OpenTMA do not seem to include pre-trained weights for the smpl212 format. Will it be provided later, or will the corresponding training code be provided? Looking forward to your reply~

@yxbian23 Thanks for your interest for this project.

After checking your question, I think there might be a bit lack of background.

The motion and text encoders aim to learn text-motion-aligned representations. The measurement of the ''alignment'' is the retrieval. Please refer to the HumanTOMATO appendix (we might update the arxiv in the coming days). We do not use the ''smpl212'' representation, and we set the 322-dim for smplx as default. The training codes have been public. BTW, I still suggest you train your model on HumanML3D because MotionX is really a bit noisy.

Opening for discussion if you want.

yxbian23 commented 3 weeks ago

Thanks for your kind help, which helped me successfully solve my problem~

LinghaoChan commented 3 weeks ago

Thanks for your question, which might help other follow-up community members.