Open eyuansu62 opened 3 months ago
From the code, the tokenizer of reward model seems to be same as policy model?
Yes
Is there a situation where the tokenizer are different?
From the code, the tokenizer of reward model seems to be same as policy model?