A modular RL library to fine-tune language models to human preferences
2.13k
stars
191
forks
source link
A question bother me a long time: What is the difference between RL-for-text-generation and delete-0-reward-model-predictions? #46
Open
guotong1988 opened 1 year ago
For text gereration.
Thank you very much!