CarperAI / trlx

A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
MIT License
4.44k stars 469 forks source link

Well-known RMs and eval utilities #415

Open cat-state opened 1 year ago

cat-state commented 1 year ago

🚀 The feature, motivation, and pitch

Collect together settings and commonly used reward models and evaluations. These RMs can be used for training time eval, but we would probably also want to use multiple-choice evals too

Ideas for RMs:

Multiple Choice evals:

herbiebradley commented 1 year ago

Working on a basic library for Anthropic's evals myself, will hopefully put something out this week.

lm-eval2 (the refactor of lm-evaluation-harness) will support OpenAI style model-graded evals too