allenai / reward-bench

RewardBench: the first evaluation tool for reward models.
https://huggingface.co/spaces/allenai/reward-bench
Apache License 2.0
296 stars 31 forks source link

Experiment with human vs gpt4 data #24

Open natolambert opened 5 months ago

natolambert commented 5 months ago

With the human data AI2 has or a dataset like no_robots, we could test if a RM prefers the human or model answers to a completion.

natolambert commented 2 weeks ago

Update: should use an open weights model for completions to private prompts, otherwise company with API has access to closed test set prompts.