Why T5 is used instead of GPT?

zhuxy12022 commented 1 year ago

It seems GPT like llama2 is more popular. But the paper still use T5. Compared to GPT, does it have any special advantages to use T5?

ChenFengYe commented 1 year ago

Hi, the first language model that we used to build MotionGPTs is LLaMA-13B. However, it shows insufficient performance and low training efficiency. We assume the reason is the limited dataset size compared to the large parameters and language data of LLaMA.

Then, we thus choose T5-770M, a small but common language model, as our final backbone, because many previous vision-language multimodal works, like Unified-IO and BLIP, have chosen T5, this encoder-decoder architecture. It shows a strong power to address multi-modal tasks. In addition, the decoder-only model has the advantage for self-supervised without pair data while we have paired data which this advance is greatly weakened. We are still working on collecting a large motion dataset for larger motion-language models.

We have evaluated MotionGPT on GPT-2 and are working on LLaMA-2+LORA. Please refer to the below.

ChangeNext commented 9 months ago

Did you only do fine-tuning, or did you also perform pre-training?

billl-jiang commented 7 months ago

Did you only do fine-tuning, or did you also perform pre-training?

Hello @ChangeNext

We employ both pre-training and fine-tuning processes for the T5 and GPT-2 models to ensure they are optimally adapted for our specific tasks.

OpenMotionLab / MotionGPT

Why T5 is used instead of GPT? #10