Using a paper from Google DeepMind I've developed a new version of the DQN using threads exploration instead of memory replay as explain in here: http://arxiv.org/pdf/1602.01783v1.pdf I used the one-step-Q-learning pseudocode, and now we can train the Pong game in less than 20 hours and without any GPU or network distribution.
The paper states that the final epsilons should be [0.1, 0.01, 0.5]. But I noticed in your code they are [0.01, 0.01, 0.05] (Strangely there are two 0.01s). Is this a mistake or intentional improvement?
I'm tunning the model myself, while I'm not sure which hyper parameters are important.
The paper states that the final epsilons should be [0.1, 0.01, 0.5]. But I noticed in your code they are [0.01, 0.01, 0.05] (Strangely there are two 0.01s). Is this a mistake or intentional improvement?
I'm tunning the model myself, while I'm not sure which hyper parameters are important.