Well-known RMs and eval utilities

CarperAI / trlx

A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)

MIT License

4.51k stars 471 forks source link

🚀 The feature, motivation, and pitch

Collect together settings and commonly used reward models and evaluations. These RMs can be used for training time eval, but we would probably also want to use multiple-choice evals too

Ideas for RMs:

Our HH RMs
Sentiments
SteamSHP?
OpenAI API GPT3.5/4 (probably only usable for test time?)

Multiple Choice evals:

Anthropics model generated evals
OpenAIs new evals
lm-evaluation-harness supported evals (would need to add ability to run model ourselves for thaT?)
Ranking evals: Provide a pre-ranked set of responses and have model order them from most to least aligned

CarperAI / trlx

Well-known RMs and eval utilities #415

🚀 The feature, motivation, and pitch