CPO question - Githubissues

fe1ixxu / ALMA

State-of-the-art LLM-based translation models.

MIT License

352 stars 26 forks source link

Open gongye19 opened 1 month ago

gongye19 commented 1 month ago

请问cpo必须使用peft_model_id以及use_peft吗？是否可以只设置，论文中好像没提到需要reference model

--model_name_or_path haoranxu/ALMA-13B-Pretrain \ --tokenizer_name haoranxu/ALMA-13B-Pretrain \

fe1ixxu commented 1 month ago

你好！不一定，您也可以用full-weight fine-tuning。

gongye19 commented 1 month ago

你好！不一定，您也可以用full-weight fine-tuning。

好的谢谢，跑通了，但我比较rewards/chosen和rewards/rejected曲线，它们之间的距离并没有被拉开