allenai / reward-bench

RewardBench: the first evaluation tool for reward models.
https://huggingface.co/spaces/allenai/reward-bench
Apache License 2.0
440 stars 52 forks source link

Support upload metadata to hf #188

Closed vwxyzjn closed 1 month ago

vwxyzjn commented 2 months ago

Quality of life improvements: It directly pushes the eval results to be visible

E.g., https://huggingface.co/vwxyzjn/rm_zephyr_new

image
vwxyzjn commented 1 month ago

Done. Let's just record as much as possible.

image

Feel free to merge after this.