Closed gwyong closed 1 year ago
Thank you for your interest in our work.
The codebook size (the size of motion vocabulary) is 512 for both HumanML3D and KIT
Thank you for the reply.
Then, does it mean. "the authors assume that all motions can be generated with 512 poses?" Here, the pose refer to 3D body joint location for a single frame.
Also, I found that when predicting motion indexes, you used "top-1" method. I want to generate diverse motions from a single text prompt, and I am planning to change it to "top-K" (e.g., top-10). Could you give me your thought for my plan? Are there any ways to generate diverse motions? For example, can we dequantize a motion index with several 3D body joint locations?
Much appreciated,
In fact, the "code" in VQVAE represents a motion segment (4 frames), and GPT is used to generate code sequences to decode complete motions.
For diverse motions, you can set the hyper-parameter "if_categorial=True" and change the random seed :
Furthermore, the "codebook" consists of discrete motion features, and the output during decoding includes 3D body joint locations.
Thank you very much. If I have anything updates, I will let you know.
Good, it works. By changing to if_categorical=True, T2M-GPT will sample the next motion index based on the probability distribution, not predict the most probable one. Thank you for Jiro-zhang.
Hello, thank you for this great work. I have a question about the below code in Colab. index_motion = trans_encoder.sample(feat_clip_text[0:1], False)
How many indexes T2M-GPT has (the size of motion vocabulary)?
Thanks,