OpenLLMAI / OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)
https://openrlhf.readthedocs.io/
Apache License 2.0
1.73k stars 164 forks source link

The tokenizer of reward model and policy model. #242

Open eyuansu62 opened 3 months ago

eyuansu62 commented 3 months ago

From the code, the tokenizer of reward model seems to be same as policy model?

hijkzzz commented 3 months ago

Yes

eyuansu62 commented 3 months ago

Is there a situation where the tokenizer are different?