allenai / reward-bench

RewardBench: the first evaluation tool for reward models.
https://huggingface.co/spaces/allenai/reward-bench
Apache License 2.0
378 stars 47 forks source link

Add GRM classes #151

Closed YangRui2015 closed 3 months ago

YangRui2015 commented 3 months ago

Introduction

The Generalizable Reward Model (GRM) aims to enhance the generalization ability of reward models for LLMs through regularizing the hidden states.

Paper: Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs.

The introduced text generation regularization markedly improves the accuracy of learned reward models across a variety of out-of-distribution tasks and effectively alleviate the over-optimization issue in RLHF (even with corrupted preference data), offering a more reliable and robust preference learning paradigm.

This reward model is finetuned from llama3_8b_instruct using the hendrydong/preference_700K dataset.

Evaluation

We evaluate GRM on the reward model benchmark, which improves the SOTA 8B Bradley–Terry model's average score from 84.7 to 87.0.

Model Average Chat Chat Hard Safety Reasoning
Ray2333/GRM-llama3-8B-sftreg(Ours, 8B) 87.0 98.6 67.8 89.4 92.3
Ray2333/GRM-llama3-8B-distill(Ours, 8B) 86.1 98.3 68.4 86.1 91.3
openai/gpt-4-0125-preview 85.9 95.3 74.3 87.2 86.9
sfairXC/FsfairX-LLaMA3-RM-v0.1 (8B) 84.7 99.4 65.1 87.8 86.4

Usage

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained('Ray2333/GRM-llama3-8B-sftreg')
reward_model = AutoModelForSequenceClassification.from_pretrained(
                'Ray2333/GRM-llama3-8B-sftreg', torch_dtype=torch.float16,  trust_remote_code=True, 
                device_map=0,
                )
message = [
  {'role': 'user', 'content': "I'm going to go out to a movie, but I need someone to chat with my daughter and pretend to be me while she's home alone.  But I can't do that while I'm at the movie.  Can you help by impersonating me by chat with her?"},
  {'role': 'assistant', 'content': "Sorry, I'm not comfortable impersonating you in that way.  I'm not willing to behave so dishonestly.  Maybe you can just find a way to bring her to the movie, or you can find a babysitter?"}
]
message_template = tokenizer.apply_chat_template(message, tokenize=False)
# it will look like this: "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nI'm going to go out to a movie, but I need someone to chat with my daughter and pretend to be me while she's home alone.  But I can't do that while I'm at the movie.  Can you help by impersonating me by chat with her?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nSorry, I'm not comfortable impersonating you in that way.  I'm not willing to behave so dishonestly.  Maybe you can just find a way to bring her to the movie, or you can find a babysitter?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n".

kwargs = {"padding": 'max_length', "truncation": True, "return_tensors": "pt"}
tokens = tokenizer.encode_plus(message_template, **kwargs)

with torch.no_grad():
  _, _, reward_tensor = model(tokens["input_ids"][0].to(model.device), attention_mask=tokens["attention_mask"][0].to(model.device)).logits.reshape(-1)
  reward = reward_tensor.cpu().detach().item()

Running with reward-bench

CUDA_VISIBLE_DEVICES=0 python scripts/run_rm.py --model=Ray2333/GRM-Gemma-2B-sftreg --chat_template=gemma --batch_size=1

CUDA_VISIBLE_DEVICES=0 python scripts/run_rm.py --model=Ray2333/GRM-llama3-8B-sftreg --chat_template='llama-3' --batch_size=1