Mael-zys / T2M-GPT

(CVPR 2023) Pytorch implementation of “T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations”
https://mael-zys.github.io/T2M-GPT/
Apache License 2.0
595 stars 52 forks source link

Sequences duration #21

Closed IlyesTal closed 1 year ago

IlyesTal commented 1 year ago

Hi,

Thanks for this job, it's impressive ! I have 2 questions. 1st, can I choose the sequence length before generating it? 2nd can I fine tune it on a certain type of actions (for example football actions)?

Thank you in advance.

Jiro-zhang commented 1 year ago

Hi,

First, we cannot choose the generated sequence length under the current training strategy. However, you can remove the End token during training, and replace the block_size with GT length here: https://github.com/Mael-zys/T2M-GPT/blob/92ffedf00df5e142515f5c1677bdb2375ce8a58e/models/t2m_trans.py#L34

Furthermore, we have not tried to fine-tune our model on other datasets, but this might work.

IlyesTal commented 1 year ago

Okay, thanks for your fast reply! I'll try it and if I have a result I get back to you. Thanks again for this amazing project!

ZhengdiYu commented 1 year ago

Hi,

First, we cannot choose the generated sequence length under the current training strategy. However, you can remove the End token during training, and replace the block_size with GT length here:

https://github.com/Mael-zys/T2M-GPT/blob/92ffedf00df5e142515f5c1677bdb2375ce8a58e/models/t2m_trans.py#L34

Furthermore, we have not tried to fine-tune our model on other datasets, but this might work.

May I ask where theEnd token is during training?

Jiro-zhang commented 1 year ago

We add end tokens and pad tokens during dataset processing.

https://github.com/Mael-zys/T2M-GPT/blob/6377b062b45d5d6aa45b2a259b3d0e91bb198bec/dataset/dataset_TM_train.py#L131

https://github.com/Mael-zys/T2M-GPT/blob/6377b062b45d5d6aa45b2a259b3d0e91bb198bec/models/t2m_trans.py#L130

ZhengdiYu commented 1 year ago

We add end tokens and pad tokens during dataset processing.

https://github.com/Mael-zys/T2M-GPT/blob/6377b062b45d5d6aa45b2a259b3d0e91bb198bec/dataset/dataset_TM_train.py#L131

https://github.com/Mael-zys/T2M-GPT/blob/6377b062b45d5d6aa45b2a259b3d0e91bb198bec/models/t2m_trans.py#L130

Thank you for your reply. May I also ask why we need GT code idx during training? What if I directly use sample() to get the logits during training instead of using forward()? Is this feasible? I was expecting this could boost the ability to predict the motion length and decrease reduce the reliance on the GT.

Actually, I have been training your network on my own dataset with detailed and complex text annotation. The generation results are pretty good if I select a text prompt from the training dataset. However, the performance dropped dramatically when I was using a text prompt from the test set. It seems to me that the network can actually memorize the training data but has a relatively weak generalization ability. So I was trying to reduce the reliance on the GT. I am not sure if my analysis was correct.