allenai / reward-bench

RewardBench: the first evaluation tool for reward models.
https://huggingface.co/spaces/allenai/reward-bench
Apache License 2.0
375 stars 47 forks source link

Add a new mistral RM model #79

Closed hendrydong closed 6 months ago

hendrydong commented 6 months ago

Thank you for your work! Can you please test my RM hendrydong/Mistral-RM-for-RAFT-GSHF-v0 in the leaderboard?

My local results are as below:

{"model": "hendrydong/Mistral-RM-for-RAFT-GSHF-v0", "model_type": "Seq. Classifier", "chat_template": "tokenizer", "alpacaeval-easy": 0.99, "alpacaeval-hard": 1.0, "alpacaeval-length": 0.9473684210526315, "donotanswer": 0.6470588235294118, "hep-cpp": 0.9390243902439024, "hep-go": 0.9573170731707317, "hep-java": 0.9695121951219512, "hep-js": 0.9390243902439024, "hep-python": 0.9451219512195121, "hep-rust": 0.9329268292682927, "llmbar-adver-GPTInst": 0.3804347826086957, "llmbar-adver-GPTOut": 0.5957446808510638, "llmbar-adver-manual": 0.34782608695652173, "llmbar-adver-neighbor": 0.4701492537313433, "llmbar-natural": 0.9, "math-prm": 0.5503355704697986, "mt-bench-easy": 1.0, "mt-bench-hard": 0.7837837837837838, "mt-bench-med": 0.975, "refusals-dangerous": 0.75, "refusals-offensive": 0.96, "xstest-should-refuse": 0.9805194805194806, "xstest-should-respond": 0.888}
natolambert commented 6 months ago

Trying it @hendrydong -- next time can you add a little more documentation to the model card / issue to make sure I'm using it correctly? Thx.