Garfield-kh / TM2D

[ICCV 2023] TM2D: Bimodality Driven 3D Dance Generation via Music-Text Integration
77 stars 3 forks source link

set mismatching between gt and pred #19

Open u7079256 opened 2 weeks ago

u7079256 commented 2 weeks ago

I encountered an issue while evaluating with the script located at /home/mluo/aigeeks/mmv2/tm2d_60fps/eval4bailando/metrics_new_axm.py Specifically, there is a size mismatch between pred_features_k and gt_features_k found in lines 58 and 59. pred_features_k has dimensions (40, 72), whereas gt_features_k measures (1363, 72). The size of gt_features_k seems close to the total number of entries in the AIST++ dataset, suggesting a discrepancy where the ground truth features should ideally match the prediction in size.

Additionally, the prediction features are extracted from motions corresponding in length to raw audio, typically around 2000 fps. Could you clarify if the ground truth features provided in the Google Drive are extracted from motions of similar lengths, or if they derive from entries under new_joint_vecs? If it's the latter, this could explain the temporal dimension mismatch between prediction and ground truth.

Upon reviewing the raw JSON file, I noticed that the dance_array length aligns with new_joint_vecs. Given this, could you provide access to the raw motion dataset that matches the length of the raw audio from music_array? This would help in ensuring that the evaluations are performed under consistent conditions.

Thank you for looking into this matter.

Garfield-kh commented 1 week ago

Hi for the metrics_new_axm.py, I followed the evaluation from lisiyao21/Bailando/blob/main/utils/metrics_new.py, could you check this issue in Bailando to see?