Open yangjiabupt opened 10 months ago
Thanks for your great works!
I'm interesting in adding a new task into pretrain stage , can you offer some advice or some refers?
And also, i want to know did you finetune all llm paramters in SFT or finetune only lore?
If not, why did you use model parallelism =2 ?
We finetune all parameters of LLM during SFT as described in paper.
the vocab size in config it "vocab_size": 155947
However, the tokenizer vocab is only 155514
The redundant tokens is use for what?
Thanks for your great works!
I'm interesting in adding a new task into pretrain stage , can you offer some advice or some refers?
And also, i want to know did you finetune all llm paramters in SFT or finetune only lore?
If not, why did you use model parallelism =2 ?