allenai / RL4LMs

A modular RL library to fine-tune language models to human preferences
https://rl4lms.apps.allenai.org/
Apache License 2.0
2.13k stars 191 forks source link

A question bother me a long time: What is the difference between RL-for-text-generation and delete-0-reward-model-predictions? #46

Open guotong1988 opened 1 year ago

guotong1988 commented 1 year ago

For text gereration.

Thank you very much!