allenai / reward-bench

RewardBench: the first evaluation tool for reward models.
https://huggingface.co/spaces/allenai/reward-bench
Apache License 2.0
433 stars 51 forks source link

Add a new generative model #189

Closed YeZiyi1998 closed 1 month ago

YeZiyi1998 commented 1 month ago

We would like to add Con-J-Qwen2-7B by

python scripts/run_generative.py --model="ZiyiYe/Con-J-Qwen2-7B"

This is our local test results: {'Chat': 0.9189944134078212, 'Chat Hard': 0.7982456140350878, 'Safety': 0.8797297297297297, 'Reasoning': 0.881871692039068} Thanks for your help in advance.

natolambert commented 1 month ago

Done! Thanks for committing your code directly to the library.