exitudio / MMM

Official repository for "MMM: Generative Masked Motion Model"
https://exitudio.github.io/MMM-page/
78 stars 4 forks source link

Motion Length Difference #21

Closed Githubhgh closed 3 months ago

Githubhgh commented 3 months ago

Hi, thanks for this wonderful work, may I ask why in the t2m_trans training, the max length is only 50, which is different from 64 of the vq training stage, whether this will effect the final result?

exitudio commented 3 months ago

Hi, thank you for interest in our work. We adopt this setting from T2M-GPT. I have asked the same question here.

First, let me clarify the detail, VQVAE is using 64 frames and down sampling to 16 tokens. While Transformers is working on token space only, which has maximum length of 49 tokens + 1 end token = 50 tokens (49 tokens x 4 = 196 frames).

Here is the explanation:

Githubhgh commented 3 months ago

Great, thanks for clarifying.