RLHFlow / RLHF-Reward-Modeling

Recipes to train reward model for RLHF.
https://rlhflow.github.io/
Apache License 2.0
359 stars 23 forks source link

question of chat templates #16

Open trueRosun opened 2 weeks ago

trueRosun commented 2 weeks ago

nice work! starred already. sorry for asking, why replacing the bos_token with empty string?

sample['positive'] = tokenizer.apply_chat_template(
        sample['chosen'], tokenize=False, add_generation_prompt=False).replace(tokenizer.bos_token, "")
sample['negative'] = tokenizer.apply_chat_template(
    sample['rejected'], tokenize=False, add_generation_prompt=False).replace(tokenizer.bos_token, "")
WeiXiongUST commented 2 weeks ago

Because when we service the Bradley Terry RM with pipeline, it will automatically add a bos_token inside the pipeline when tokenizing.

For pair-wise preference model, it is because we train the model without a bos_token (this is indeed some issue of llama3 at that time). But the influence of the bos token is mild in general.

trueRosun commented 2 weeks ago

thank you for answering!

I will further check the outputs after tokenization.

hunterlang commented 4 days ago

Because when we service the Bradley Terry RM with pipeline, it will automatically add a bos_token inside the pipeline when tokenizing.

I don't fully understand...if the inference-time pipeline adds the bos_token automatically, doesn't that mean we should train with the bos token?

WeiXiongUST commented 4 days ago

Because when we service the Bradley Terry RM with pipeline, it will automatically add a bos_token inside the pipeline when tokenizing.

I don't fully understand...if the inference-time pipeline adds the bos_token automatically, doesn't that mean we should train with the bos token?

Yes, you are correct. Unfortunately, when we train the model, there is a bug in the llama3 tokenizer so the model is trained WITHOUT bos token.

We have tested with/without bos, it can lead to ~1% difference in the reward bench accuracy. You may modify the tokenizer to prevent the tokenizer from adding a bos token automatically to fix the issue I guess...

hunterlang commented 4 days ago

Thanks for the reply! Just to clarify:

If I remove those .replace(tokenizer.bos_token, "") calls, then training should match inference, because the inference pipeline adds BOS automatically.

If I modify the tokenizer, then the inference pipeline will match the off-the-shelf models you already released, which were trained without BOS?

WeiXiongUST commented 4 days ago

We get a bos token when we tokenize by apply chat template. Then, inside the pipeline, we get another bos token.

If you remove .replace(tokenizer.bos_token, ""), you still get one bos token inside the pipeline. If you do not remove .replace(tokenizer.bos_token, ""), you will get two bos tokens.

If we modify the tokenizer to avoid it to add bos token, then we will never get bos token. Then, it matches the training (no bos token).