Add new reward models - Githubissues

Currently, no reward models smaller than 7B have ranked in the top 30 on Reward-Bench. We provide new 2B and 3B reward models that are comparable to large-scale reward models.

Please add the following code into reward-bench/rewardbench/models/init.py (other two fine-tuned models do not need this modification).

"Ray2333/GRM-Gemma2-2B-sftreg": {
        "model_builder": GRewardModel.from_pretrained,
        "pipeline_builder": GRMPipeline,
        "quantized": False,
        "custom_dialogue": False,
        "model_type": "Seq. Classifier",
    },
    "Ray2333/GRM-llama3.2-3B-sftreg": {
        "model_builder": GRewardModel.from_pretrained,
        "pipeline_builder": GRMPipeline,
        "quantized": False,
        "custom_dialogue": False,
        "model_type": "Seq. Classifier",
    },

Training Details: Ray2333/GRM-llama3.2-3B-sftreg is trained on hendrydong/preference_700K. Ray2333/GRM-Gemma2-2B-sftreg is trained on weqweasdas/preference_dataset_mixture2_and_safe_pku. Ray2333/GRM-Llama3.2-3B-rewardmodel-ft and Ray2333/GRM-gemma2-2B-rewardmodel-ft are fine-tuned on the decontaminated dataset Skywork/Skywork-Reward-Preference-80K-v0.2.

The evaluation commands are:

CUDA_VISIBLE_DEVICES=0 python scripts/run_rm.py --model=Ray2333/GRM-Gemma2-2B-sftreg --batch_size=8 --not_quantized

CUDA_VISIBLE_DEVICES=0 python scripts/run_rm.py --model=Ray2333/GRM-llama3.2-3B-sftreg --batch_size=8 --not_quantized

CUDA_VISIBLE_DEVICES=0 python scripts/run_rm.py --model=Ray2333/GRM-gemma2-2B-rewardmodel-ft --batch_size=8 --not_quantized

CUDA_VISIBLE_DEVICES=0 python scripts/run_rm.py --model=Ray2333/GRM-llama3.2-3B-rewardmodel-ft --batch_size=8 --not_quantized

The local scores are:

Model	Average	Chat	Chat Hard	Safety	Reasoning
Ray2333/GRM-Llama3.2-3B-rewardmodel-ft(3B)	90.9	91.6	84.9	92.7	94.6
Ray2333/GRM-gemma2-2B-rewardmodel-ft (2B)	88.4	93.0	77.2	92.2	91.2
Ray2333/GRM-llama3.2-3B-sftreg(3B)	85.8	96.4	67.1	88.2	91.6
Ray2333/GRM-Gemma2-2B-sftreg(2B)	81.0	97.2	59.6	86.9	80.3

allenai / reward-bench

Add new reward models #202