allenai / reward-bench

RewardBench: the first evaluation tool for reward models.
https://huggingface.co/spaces/allenai/reward-bench
Apache License 2.0
277 stars 27 forks source link

`pad_token_id` issue #115

Closed hank0316 closed 2 months ago

hank0316 commented 2 months ago

Hi,

I encountered some problem about pad_token_id.

I trained a TinyLlama reward model by modifying TRL sample code. I want to use this benchmark for evaluation. I add the following code into REWARD_MODEL_CONFIG:

"TinyLlama/TinyLlama-1.1B-Chat-v0.5": {
        "model_builder": AutoModelForSequenceClassification.from_pretrained,
        "pipeline_builder": pipeline,
        "quantized": False,
        "custom_dialogue": False,
        "model_type": "Seq. Classifier",
    },
}

And I run the evaluation by python scripts/run_rm.py --model=TinyLlama/TinyLlama-1.1B-Chat-v0.5 --chat_template=TinyLlama --do_not_save.

The error message looks like this:

Traceback (most recent call last):
  File "/work/hank0316/reward-bench/scripts/run_rm.py", line 337, in <module>
    main()
  File "/work/hank0316/reward-bench/scripts/run_rm.py", line 200, in main
    results_rej = reward_pipe(dataset["text_rejected"], **reward_pipeline_kwargs)
  File "/home/hank0316/.local/lib/python3.10/site-packages/transformers/pipelines/text_classification.py", line 156, in __call__
    result = super().__call__(*inputs, **kwargs)
  File "/home/hank0316/.local/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1198, in __call__
    outputs = list(final_iterator)
  File "/home/hank0316/.local/lib/python3.10/site-packages/transformers/pipelines/pt_utils.py", line 124, in __next__
    item = next(self.iterator)
  File "/home/hank0316/.local/lib/python3.10/site-packages/transformers/pipelines/pt_utils.py", line 125, in __next__
    processed = self.infer(item, **self.params)
  File "/home/hank0316/.local/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1123, in forward
    model_outputs = self._forward(model_inputs, **forward_params)
  File "/home/hank0316/.local/lib/python3.10/site-packages/transformers/pipelines/text_classification.py", line 187, in _forward
    return self.model(**model_inputs)
  File "/home/hank0316/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/hank0316/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/hank0316/.local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1401, in forward
    raise ValueError("Cannot handle batch sizes > 1 if no padding token is defined.")
ValueError: Cannot handle batch sizes > 1 if no padding token is defined.

I think this error message means that there's no pad_token_id in the model config. The pad_token_id exists in the tokenizer. As a result, I add two line of code under line 185 of scripts/run_rm.py:

    elif reward_pipe.model.config.pad_token_id is None:
        reward_pipe.model.config.pad_token_id = reward_pipe.tokenizer.pad_token_id

I would appreciate it if someone could review this modification to confirm that it correctly addresses the issue. Additionally, if there are any concerns or alternative approaches to consider, please let me know.

Thank you for your attention to this matter.

Best regards, Hank

natolambert commented 2 months ago

Thanks for raising this. Let's take a look. Maybe @ValentinaPy (mentioned some interested in continuing on this project)

natolambert commented 2 months ago

Also @hank0316 -- if you want to open a PR with that solution we can test it further!

hank0316 commented 2 months ago

@natolambert , I have opened the PR. Would you kindly review it? Additionally, I apologize for inadvertently pressing the close button; I am not familiar with adding comments to an issue.

hank0316 commented 2 months ago

Hey @natolambert, I have a question about training a reward model. Do you think it's necessary to incorporate chat templates during data preprocessing for RM training? And if yes, should the template align with those used in SFT?

natolambert commented 2 months ago

Yes @hank0316 chat templates are important. I think there can be slightly differences (e.g. for RM you aren't generating after iirc), but it should match at a high level.

hank0316 commented 1 month ago

Yes @hank0316 chat templates are important. I think there can be slightly differences (e.g. for RM you aren't generating after iirc), but it should match at a high level.

Sure, @natolambert! I appreciate your response and this fantastic benchmark. I'm also curious if any models on the leaderboard employ the tulu chat template for evaluation but utilize their own chat template in SFT. In other words, the template used by that model in SFT differs from the one utilized in this benchmark.

natolambert commented 1 month ago

@hank0316 nope, not to my knowledge. Most use tokenizers implementation.