OpenMotionLab / MotionGPT

[NeurIPS 2023] MotionGPT: Human Motion as a Foreign Language, a unified motion-language generation model using LLMs
https://motion-gpt.github.io
MIT License
1.35k stars 83 forks source link

Why T5 is used instead of GPT? #10

Open zhuxy12022 opened 11 months ago

zhuxy12022 commented 11 months ago

It seems GPT like llama2 is more popular. But the paper still use T5. Compared to GPT, does it have any special advantages to use T5?

ChenFengYe commented 11 months ago

Hi, the first language model that we used to build MotionGPTs is LLaMA-13B. However, it shows insufficient performance and low training efficiency. We assume the reason is the limited dataset size compared to the large parameters and language data of LLaMA.

Then, we thus choose T5-770M, a small but common language model, as our final backbone, because many previous vision-language multimodal works, like Unified-IO and BLIP, have chosen T5, this encoder-decoder architecture. It shows a strong power to address multi-modal tasks. In addition, the decoder-only model has the advantage for self-supervised without pair data while we have paired data which this advance is greatly weakened. We are still working on collecting a large motion dataset for larger motion-language models.

We have evaluated MotionGPT on GPT-2 and are working on LLaMA-2+LORA. Please refer to the below. image

ChangeNext commented 4 months ago

Did you only do fine-tuning, or did you also perform pre-training?

billl-jiang commented 3 months ago

Did you only do fine-tuning, or did you also perform pre-training?

Hello @ChangeNext

We employ both pre-training and fine-tuning processes for the T5 and GPT-2 models to ensure they are optimally adapted for our specific tasks.