allenai reward-bench issues

allenai / reward-bench

RewardBench: the first evaluation tool for reward models.

https://huggingface.co/spaces/allenai/reward-bench

Apache License 2.0

440 stars 52 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Is eval set on huggingface the eval set or train set?

#106 andrewsiah closed 7 months ago
1
Update table loading

#105 ljvmiranda921 closed 7 months ago
0
[Add Model] Better-PairRM + Relative path

#104 StableFluffy closed 7 months ago
4
Bump black from 23.1.0 to 24.3.0

#103 dependabot[bot] closed 7 months ago
0
[Model Request] mightbe/Better-PairRM

#102 StableFluffy closed 7 months ago
2
Newest week's models

#101 natolambert closed 7 months ago
0
adding kto as a separate category

#100 kawine closed 7 months ago
4
More models

#99 natolambert closed 7 months ago
0
Saving fix

#98 natolambert closed 7 months ago
0
Minor run_rm.py fixes

#97 PavelCz closed 7 months ago
3
DPO ref free sweep prep

#96 natolambert closed 7 months ago
1
multi gpu inference with run_rm.py

#95 SeungoneKim closed 5 months ago
3
Fix EOS token bug on FastChat models (non DPO)

#94 natolambert closed 7 months ago
0
Experiment request: DPO with different betas

#93 natolambert closed 7 months ago
1
Visualization requests

#92 natolambert closed 4 months ago
1
Output leaderboard scores when running `run_rm.py`

#91 natolambert closed 7 months ago
0
Check EOS token on FastChat models

#90 natolambert closed 7 months ago
1
Saving bug (non breaking)

#89 natolambert closed 7 months ago
0
Dataset v2 discussion & feedback

#88 natolambert opened 8 months ago
4
[Core team] Migrate Prior Sets to 50% weight

#87 natolambert closed 7 months ago
1
Initial generative RM implementation (via API)

#86 natolambert closed 7 months ago
1
New week new models

#85 natolambert closed 7 months ago
0
adding Archangel models (dpo, kto, sft+dpo, sft+kto)

#84 kawine closed 7 months ago
0
Rename Starling 34B

#83 natolambert closed 7 months ago
0
Clean up / enhance DPO code

#82 natolambert closed 4 months ago
1
stanfordnlp/SteamSHP-flan-t5 performance on SHP and HH-RLHF Helpful

#81 timbmg closed 8 months ago
1
Add new model weqweasdas/RM-Mistral-7B

#80 WeiXiongUST closed 8 months ago
0
Add a new mistral RM model

#79 hendrydong closed 8 months ago
1
Add models, refactor eval configs, fix beaver cost

#78 natolambert closed 8 months ago
1
Add new model Mistral-7B-instruct-Unified-Feedback

#77 YangRui2015 closed 8 months ago
0
Add Nvidia RMs (and Nemo compatibility)

#76 natolambert closed 4 months ago
2
Check beaver cost model

#75 natolambert closed 8 months ago
1
Update train_rm.py

#74 eltociear closed 8 months ago
0
update paper

#73 natolambert closed 8 months ago
0
Include MT Bench score figure

#72 ljvmiranda921 closed 8 months ago
0
Improve model distribution

#71 ljvmiranda921 closed 8 months ago
0
Small PR to add OLMo Instruct

#70 natolambert closed 8 months ago
0
Add contributing models text

#69 natolambert closed 8 months ago
1
Auto-rotate the column names so that it's easier to copy

#68 ljvmiranda921 closed 8 months ago
0
Plot distribution of RM scores for each RM

#67 natolambert closed 8 months ago
0
Small nits

#66 natolambert closed 8 months ago
0
Refactor visualization

#65 ljvmiranda921 closed 8 months ago
0
Add name substitution to benchmark results

#64 ljvmiranda921 closed 8 months ago
0
Minor README fix

#63 ljvmiranda921 closed 8 months ago
0
Nit to length table

#62 natolambert closed 8 months ago
0
Paper ready plot... for appendix at least

#61 natolambert closed 8 months ago
0
Configs & release docker

#60 natolambert closed 8 months ago
0
Cleanup of auxiliary scripts

#59 ljvmiranda921 closed 8 months ago
1
Remove PKU and other cleaning

#58 natolambert closed 8 months ago
0
WIP Code to replace one subset (if we want it)

#57 natolambert closed 8 months ago
0

Previous Next