Implementing self-play - Githubissues

eublefar commented 1 year ago

Hello

I would like to implement self-play dialogue training. For that I guess I need to modify episode rollout process by adding formatting like speaker id on the start of each line. I'd also like to try holding some model buffer of previous checkpoints and use them as one of the conversants to avoid model overfitting to itself.

The obvious place for it is implementing a new policy that provides formatted generation results and holds previous checkpoints in the buffer.

Is there any better place to implement this? Anything I should consider library-wise while implementing it?

Any advice would be appreciated, thanks in advance!

rajcscw commented 1 year ago

For self-dialogue training, I think you need to update the following:

[env] (https://github.com/allenai/RL4LMs/blob/main/rl4lms/envs/text_generation/env.py) - This is where you need to update the dynamics. For example, once is reached, add the speaker id to the generated text so far.
Policy implementation to hold previous checkpoints

eublefar commented 1 year ago

Thanks a lot for the directions! Should I create a pull request when it's ready?

rajcscw commented 1 year ago

Yes feel free to contribute :)

allenai / RL4LMs

Implementing self-play #18