allenai / RL4LMs

A modular RL library to fine-tune language models to human preferences
https://rl4lms.apps.allenai.org/
Apache License 2.0
2.17k stars 191 forks source link

Implementing self-play #18

Open eublefar opened 1 year ago

eublefar commented 1 year ago

Hello

I would like to implement self-play dialogue training. For that I guess I need to modify episode rollout process by adding formatting like speaker id on the start of each line. I'd also like to try holding some model buffer of previous checkpoints and use them as one of the conversants to avoid model overfitting to itself.

The obvious place for it is implementing a new policy that provides formatted generation results and holds previous checkpoints in the buffer.

Is there any better place to implement this? Anything I should consider library-wise while implementing it?

Any advice would be appreciated, thanks in advance!

rajcscw commented 1 year ago

For self-dialogue training, I think you need to update the following:

eublefar commented 1 year ago

Thanks a lot for the directions! Should I create a pull request when it's ready?

rajcscw commented 1 year ago

Yes feel free to contribute :)