Mael-zys / T2M-GPT

(CVPR 2023) Pytorch implementation of “T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations”
https://mael-zys.github.io/T2M-GPT/
Apache License 2.0
595 stars 52 forks source link

How many motion indexes do you have? #53

Closed gwyong closed 1 year ago

gwyong commented 1 year ago

Hello, thank you for this great work. I have a question about the below code in Colab. index_motion = trans_encoder.sample(feat_clip_text[0:1], False)

How many indexes T2M-GPT has (the size of motion vocabulary)?

Thanks,

Jiro-zhang commented 1 year ago

Thank you for your interest in our work.

The codebook size (the size of motion vocabulary) is 512 for both HumanML3D and KIT

gwyong commented 1 year ago

Thank you for the reply.

Then, does it mean. "the authors assume that all motions can be generated with 512 poses?" Here, the pose refer to 3D body joint location for a single frame.

Also, I found that when predicting motion indexes, you used "top-1" method. I want to generate diverse motions from a single text prompt, and I am planning to change it to "top-K" (e.g., top-10). Could you give me your thought for my plan? Are there any ways to generate diverse motions? For example, can we dequantize a motion index with several 3D body joint locations?

Much appreciated,

Jiro-zhang commented 1 year ago

In fact, the "code" in VQVAE represents a motion segment (4 frames), and GPT is used to generate code sequences to decode complete motions.

For diverse motions, you can set the hyper-parameter "if_categorial=True" and change the random seed :

https://github.com/Mael-zys/T2M-GPT/blob/6377b062b45d5d6aa45b2a259b3d0e91bb198bec/models/t2m_trans.py#L33

https://github.com/Mael-zys/T2M-GPT/blob/6377b062b45d5d6aa45b2a259b3d0e91bb198bec/options/option_transformer.py#L63

Furthermore, the "codebook" consists of discrete motion features, and the output during decoding includes 3D body joint locations.

gwyong commented 1 year ago

Thank you very much. If I have anything updates, I will let you know.

gwyong commented 1 year ago

Good, it works. By changing to if_categorical=True, T2M-GPT will sample the next motion index based on the probability distribution, not predict the most probable one. Thank you for Jiro-zhang.