XingliangJin / MCM-LDM

[CVPR 2024] Arbitrary Motion Style Transfer with Multi-condition Motion Latent Diffusion Model
43 stars 8 forks source link

Evaluation data #9

Open Boeun-Kim opened 1 month ago

Boeun-Kim commented 1 month ago

Thanks for the great work!

In the paper, for the CRA metric, the authors mentioned that "We train a content classifier as a feature extractor using [49] on a subset of the HumanML3D test set with annotated content labels". The same process is applied to the SRA with another subset.

Could you specify the "subsets" and how you got content and style labels?

Many thanks.

XingliangJin commented 1 month ago

Due to the length of the main manuscript, we have included this information in the supplementary materials. In brief, training the style classifier uses a style motion dataset obtained by segmenting several long motions (each one has its own style label) from the CMU dataset, while training the content classifier uses a content motion dataset from a portion of the CMU dataset that has been annotated with content label.

Boeun-Kim commented 1 month ago

Thank you Xingliang!

Your study is very impressive and I'd like to cite it in my next paper. But I'm struggling with evaluation now. If you don't mind, could you provide the code for SRA? Or if you can't, could you let me know the source (other repository) where you referred?

I read the attached supplementary materials, thank you! I'm a little confused about the datasets in Table 1. Did you train with full HumanML3D, and evaluate with chosen 30 motions in CMU datasets? Or did you train with CMU datasets for the style motion?

Many thanks! Best wishes, Boeun Kim

2024년 10월 9일 (수) 오전 11:51, Xingliang Jin @.***>님이 작성:

Due to the length of the main manuscript, we have included this information in the supplementary materials https://openaccess.thecvf.com/content/CVPR2024/supplemental/Song_Arbitrary_Motion_Style_CVPR_2024_supplemental.zip. In brief, training the style classifier uses a style motion dataset obtained by segmenting several long motions (each one has its own style label) from the CMU dataset, while training the content classifier uses a content motion dataset from a portion of the CMU dataset that has been annotated with content label.

— Reply to this email directly, view it on GitHub https://github.com/XingliangJin/MCM-LDM/issues/9#issuecomment-2401986202, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP274IOLVX7O4R3TVXL4IXDZ2UDC3AVCNFSM6AAAAABPUFRDPOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMBRHE4DMMRQGI . You are receiving this because you authored the thread.Message ID: @.***>

XingliangJin commented 1 month ago
  1. You can refer ACTOR for evaluation SRA (ACTOR compute the accuracy of action label).
  2. We train with full HumanML3D, and ecaluate with chosen motions in CMU dataset. Actually, CMU dataset is a subset of HumanML3D.

I guess your expriment is also trained with HumanML3D. I recommand using the evaluation code in SMooDi, which is done in standard style motion dataset 100STYLE.

XingliangJin commented 1 month ago

We have just initially open-sourced the evaluation code in here. You are welcome to give it a try!