YuanGongND / ltu

Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
390 stars 36 forks source link

Question about LTU-AS Downstream Tasks #17

Open dingdongwang opened 9 months ago

dingdongwang commented 9 months ago

Hi, I have a question about the LTU-AS FT. I saw the model used in finetune.py is only trained based on LlamaForCausalLM. However, since there has many classification downstream tasks (e.g. emotion recognition), what is the potential consideration that not use the LlamaForSequenceClassification? Besides, what's your opionin about replacing by LlamaForSequenceClassification if I only used it for downstream classification task?

Thank you and looking forward to your reply!

YuanGongND commented 9 months ago

Hi, I have a question about the LTU-AS FT. I saw the model used in finetune.py is only trained based on LlamaForCausalLM. However, since there has many classification downstream tasks (e.g. emotion recognition), what is the potential consideration that not use the LlamaForSequenceClassification?

The main novelty of LTU line of work is it generalizes to tasks without finetuning. Modeling it as sequence classification takes away this advantage. We advertise it as Audio Large Language Model and LLM in current context usually refers to causal language modeling.

Besides, what's your opionin about replacing by LlamaForSequenceClassification if I only used it for downstream classification task?

My opinion is it will be an overkill to use a 7B million model for classification task. Though my opinion could be completely wrong.

dingdongwang commented 9 months ago

Thank you for your reply! May I kindly ask what is the FT training loss and your training para sets (mainly for the epoch) for LTU-AS based on the toy dataset? Since I have tried using the finetune_toy_low_resource.sh with 25-epoch, and the loss roughly equal to 0.04, I'm not sure if it close to the optimal.

Besides, Since I saw the 3-stage trianing table mentioned in the paper, the original model has only trained based on only 1~2 epochs for each stage, may I kindly ask if 1~2 epoch enough for convergence and what is the roughly loss value of the experiment?

Thank you again!

YuanGongND commented 9 months ago

Thank you for your reply! May I kindly ask what is the FT training loss and your training para sets (mainly for the epoch) for LTU-AS based on the toy dataset? Since I have tried using the finetune_toy_low_resource.sh with 25-epoch, and the loss roughly equal to 0.04, I'm not sure if it close to the optimal.

This seems to be too small. We haven't try train 25 epochs. The loss of the toy script is in the bottom of the .sh file. You should get similar (that is 1 epoch).

Aren't all training hyper-parameters included in the .sh file?

https://github.com/YuanGongND/ltu/blob/main/src/ltu_as/train_scripts/finetune_toy_low_resource.sh

Besides, Since I saw the 3-stage trianing table mentioned in the paper, the original model has only trained based on only 12 epochs for each stage, may I kindly ask if 12 epoch enough for convergence and what is the roughly loss value of the experiment?

Please go to the main readme and search Where are the loss logs?