Is it reasonable that pred is better that gt on text-motion-matching metrics on HumanML3D

CDLCHOI commented 4 months ago

I notice that GT R precision top1~ top3 is about 51% 70% 79% in HumanML3D dataset. But MoMask gets 52% 71% 81% Rprecision. So is MMDist. Could you explain that? Thanks. @EricGuo5513 @Murrol

Murrol commented 4 months ago

Good question.

It probably results from the distribution difference between the text-motion training set and the testing set. The evaluator is also a network trained on the training set to fit its distribution, which might have variance/bias error on the testing set. If the generated text-motion data aligns with the training set distribution better, the evaluation metrics could be even better than the GT testing set.

The quantitative evaluation of motion generation performance would be an interesting topic to discuss and explore.

CDLCHOI commented 4 months ago

Good question.

It probably results from the distribution difference between the text-motion training set and the testing set. The evaluator is also a network trained on the training set to fit its distribution, which might have variance/bias error on the testing set. If the generated text-motion data aligns with the training set distribution better, the evaluation metrics could be even better than the GT testing set.

The quantitative evaluation of motion generation performance would be an interesting topic to discuss and explore.

Thanks for your reply. Another question：The evaluator was designed for test set. But , why the evaluator was trained on train set instead of test set. If evaluator was trained on test set, maybe we can get a more credible text-motion-matching metric.

EricGuo5513 / momask-codes

Is it reasonable that pred is better that gt on text-motion-matching metrics on HumanML3D #60