allenai / reward-bench

RewardBench: the first evaluation tool for reward models.
https://huggingface.co/spaces/allenai/reward-bench
Apache License 2.0
440 stars 52 forks source link

Add A New Generative Model #186

Closed ZhichaoWang970201 closed 2 months ago

ZhichaoWang970201 commented 2 months ago

Hi RewardBench Team 👋,

We have updated a 70B version generative model:

SF-Foundation/TextEval-70B Our local evaluation metrics for the model is listed as bellow:

{'Chat': 0.946927374301676, 'Chat Hard': 0.9035087719298246, 'Safety': 0.9318428922428922, 'Reasoning': 0.9646321001800622}

How to Run Evaluation Script For this generative model, it's okay to evaluate it with the default scripts/run_generative.py script. Please notice that we need at least 8 gpus to run run_generative.py script, and export VLLM_WORKER_MULTIPROC_METHOD=spawn is required for vLLM multi-gpu inference.

export VLLM_WORKER_MULTIPROC_METHOD=spawn cd reward-bench python scripts/run_generative.py --model=SF-Foundation/TextEval-70B --num_gpus 8 We would like to add this new generative model to the RewardBench LeaderBoard.

Thank you!

ZhichaoWang970201 commented 2 months ago

Forget to stress the code! Sorry for that.


export VLLM_WORKER_MULTIPROC_METHOD=spawn
cd reward-bench
python scripts/run_generative.py --model=SF-Foundation/TextEval-70B --num_gpus 8