YoungSeng / QPGesture

QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation (CVPR 2023 Highlight)
82 stars 7 forks source link

About the Quantitative results #16

Closed huifu99 closed 1 year ago

huifu99 commented 1 year ago

You mentioned that only two speakers are used to train and test, but how do you calculate the metrics? For all baselines, they calculate the metircs along all speakers, but you only use two?

YoungSeng commented 1 year ago

The training and test sets computed by the proposed model and all baseline models are the same, on two speakers. See this issue: https://github.com/YoungSeng/QPGesture/issues/13

huifu99 commented 1 year ago

Thanks for reply, but the data size becomes small when using only two speakers...

YoungSeng commented 1 year ago

Yes, because as far as I know there is currently no work done on all of BEAT because the amount of data is a bit large, and CaMN is also done on four English speakers. Also how to extract audio and text representations in different languages (Chinese, English, Japanese, etc.) if all speakers in BEAT are to be used may be a potential problem.

huifu99 commented 1 year ago

OK, tks