Mael-zys / T2M-GPT

(CVPR 2023) Pytorch implementation of “T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations”
https://mael-zys.github.io/T2M-GPT/
Apache License 2.0
587 stars 52 forks source link

Question about Table 1comparing various state-of-the-art methods on the t2m dataset in the paper #20

Closed sohananisetty closed 1 year ago

sohananisetty commented 1 year ago

In Table 1 in the paper, you mentioned that for MDM and MotionDiffuse, § reports results using ground-truth motion length. What does this mean? I looked through the evaluation code and it seems that you are also using the ground truth motion lengths during evaluation. Or are you truncating the motions to the length used during training, i.e. window size?

Jiro-zhang commented 1 year ago

In MDM and MotionDiffuse, motion is generated based on the length of the ground truth. In our work, we directly predict the 'End' token, so the length of the gt is not needed. https://github.com/Mael-zys/T2M-GPT/blob/92ffedf00df5e142515f5c1677bdb2375ce8a58e/models/t2m_trans.py#L45 https://github.com/Mael-zys/T2M-GPT/blob/92ffedf00df5e142515f5c1677bdb2375ce8a58e/models/t2m_trans.py#L50

Mael-zys commented 1 year ago

Hello, MDM and MotionDiffuse use ground-truth motion length for generation, which means their generated motion length during evaluation is the same as ground-truth motion length. For T2M-GPT, the gt motion length in our evaluation code is only used to extract the gt motion features. For generation, we use an 'End' token to predict the end of a motion, so it doesn't require the gt motion length as generation input, and our generated motion length during evaluation can be different from gt motion length.

sohananisetty commented 1 year ago

Thank you so much for the clarification. I was working on something very similar but you guys beat me to it! I also noticed that the metrics changed depending on the mean and std files referenced by the meta dir. https://github.com/Mael-zys/T2M-GPT/blob/92ffedf00df5e142515f5c1677bdb2375ce8a58e/dataset/dataset_TM_eval.py#L39

Is this the mean and std obtained after doing the processing steps of HumanML3D? If not how did you obtain this mean and variance npy files? I wanted to compare my method to yours but using the mean and std of HumanML3D gave me different real metrics.

Mael-zys commented 1 year ago

Hello, the mean and std are originally obtained from a CVPR2022 paper text-to-motion: https://github.com/EricGuo5513/text-to-motion/blob/main/train_comp_v6.py#L128-L129 https://github.com/EricGuo5513/text-to-motion/blob/main/data/dataset.py#L93-L115

In our code, we directly load the mean and std provided in their repo.