ajpkim / dqn_connect4

0 stars 0 forks source link

Self Play #1

Open nicehzj opened 1 year ago

nicehzj commented 1 year ago

Hi, I just read and reproduced your code. It's so good and easy to understand. But I have a question that maybe we should use the best model as the self play agent to generate memory? If our training agent cannot beat the best model, the memory from training agent only make performance worse. I change it to the best model and get better training performance.

Thanks!

ajpkim commented 1 year ago

That's great! Feel free to share the updates you made here and/or in a pull request!