Self Play - Githubissues

Hi, I just read and reproduced your code. It's so good and easy to understand. But I have a question that maybe we should use the best model as the self play agent to generate memory? If our training agent cannot beat the best model, the memory from training agent only make performance worse. I change it to the best model and get better training performance.

Thanks!

ajpkim / dqn_connect4

Self Play #1