allenai / reward-bench

RewardBench: the first evaluation tool for reward models.
https://huggingface.co/spaces/allenai/reward-bench
Apache License 2.0
440 stars 52 forks source link

added offsetbias execute prompt and judgement process code #159

Closed sanghyuk-choi closed 4 months ago

sanghyuk-choi commented 4 months ago

I've just added prompt and judgement code for NCSOFT/Llama-3-OffsetBias-8B.

natolambert commented 4 months ago

@sanghyuk-choi need some changes to support this model. E.g. which version of VLLM do we need? Seems like we at least need that (could not run this on my previous image).

EDIT: This may be a different issue in run_generative, checking.

natolambert commented 4 months ago

@sanghyuk-choi I made changes directly here: https://github.com/sanghyuk-choi/reward-bench/pull/1

natolambert commented 4 months ago

Scores are live, so we should be able to merge this soon.

sanghyuk-choi commented 4 months ago

also I have fixed my mistake by adding model_modifier when calling process_judgement in run_judge_pair function.

natolambert commented 4 months ago

Nice, yeah @sanghyuk-choi this looks good. Sorry for lumping a bunch of small fixes into your code, I got a little carried away. merging as long as checks pass.