allenai / reward-bench

RewardBench: the first evaluation tool for reward models.
https://huggingface.co/spaces/allenai/reward-bench
Apache License 2.0
440 stars 52 forks source link

Support loading model from wandb #184

Closed vwxyzjn closed 2 months ago

vwxyzjn commented 2 months ago

Would like to get your high-level thoughts. This PR allows us to load model from wandb and save the evaluated results to wandb as well.

E.g., https://wandb.ai/ai2-llm/open_instruct_internal/runs/u9f16bws?nw=nwusercostah

image

And then you can do this kind of visualization.

image