SFT use lora? or finetune all parameters?

QwenLM / Qwen-Audio

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Other

1.49k stars 107 forks source link

Open yangjiabupt opened 10 months ago

yangjiabupt commented 10 months ago

Thanks for your great works！

I'm interesting in adding a new task into pretrain stage , can you offer some advice or some refers?

And also, i want to know did you finetune all llm paramters in SFT or finetune only lore?

If not， why did you use model parallelism =2 ?

Jxu-Thu commented 10 months ago

We finetune all parameters of LLM during SFT as described in paper.

yangjiabupt commented 10 months ago

the vocab size in config it "vocab_size": 155947

However, the tokenizer vocab is only 155514

The redundant tokens is use for what？