CarperAI / trlx

A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
MIT License
4.44k stars 469 forks source link

Self Play #28

Open cat-state opened 1 year ago

cat-state commented 1 year ago

Self play, and generally multi-LM-agent settings are something we are very interested in exploring. What does it take to support this? Does it already work without big overheads?

cat-state commented 1 year ago

This would also tie in to MCTS in the future, although that would likely require more thought on how to do it efficiently

honglu2875 commented 1 year ago

This would also tie in to MCTS in the future, although that would likely require more thought on how to do it efficiently

I guess it would be somewhat like MuZero? Any relevant papers on this? I'm interested as I'm working on MCTS-based RL as well.

promiseve commented 1 year ago

I am interested in collaborating with you all on the self-play build. Email: promisevekpo1@gmail.com