For this generative model, it's okay to evaluate it with the default scripts/run_generative.py script. Please notice that we need at least 4 gpus to run run_generative.py script, and export VLLM_WORKER_MULTIPROC_METHOD=spawn is required for vLLM multi-gpu inference.
export VLLM_WORKER_MULTIPROC_METHOD=spawn
cd reward-bench
model_name_or_path="Skywork/Skywork-Critic-Llama-3.1-70B"
python scripts/run_generative.py --model $model_name_or_path --trust_remote_code --do_not_save --force_local --num_gpus 4 2>&1 | tee ./evaluation_logs.txt
We would like to add this new generative model to the RewardBench LeaderBoard.
Hi RewardBench Team 👋,
We have updated a 70B version generative model:
Our local evaluation metrics for the model is listed as bellow:
Our hardware and environments:
How to Run Evaluation Script
For this generative model, it's okay to evaluate it with the default scripts/run_generative.py script. Please notice that we need at least 4 gpus to run
run_generative.py
script, andexport VLLM_WORKER_MULTIPROC_METHOD=spawn
is required for vLLM multi-gpu inference.We would like to add this new generative model to the RewardBench LeaderBoard.
Thank you!