Using a paper from Google DeepMind I've developed a new version of the DQN using threads exploration instead of memory replay as explain in here: http://arxiv.org/pdf/1602.01783v1.pdf I used the one-step-Q-learning pseudocode, and now we can train the Pong game in less than 20 hours and without any GPU or network distribution.
Exception in thread Thread-31:
Traceback (most recent call last):
File "/home/anderson/.conda/envs/tensorflow/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/home/anderson/.conda/envs/tensorflow/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "", line 141, in actorLearner
x_t1_col, r_t, terminal, info = env.step(KEYMAP[GAME][action_index])
File "/home/anderson/Videos/gym/gym/wrappers/time_limit.py", line 31, in step
observation, reward, done, info = self.env.step(action)
File "/home/anderson/Videos/gym/gym/envs/atari/atari_env.py", line 68, in step
action = self._action_set[a]
IndexError: index 5 is out of bounds for axis 0 with size 4
when i start the thread it show this exception
Exception in thread Thread-31: Traceback (most recent call last): File "/home/anderson/.conda/envs/tensorflow/lib/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/home/anderson/.conda/envs/tensorflow/lib/python3.6/threading.py", line 864, in run self._target(*self._args, **self._kwargs) File "", line 141, in actorLearner
x_t1_col, r_t, terminal, info = env.step(KEYMAP[GAME][action_index])
File "/home/anderson/Videos/gym/gym/wrappers/time_limit.py", line 31, in step
observation, reward, done, info = self.env.step(action)
File "/home/anderson/Videos/gym/gym/envs/atari/atari_env.py", line 68, in step
action = self._action_set[a]
IndexError: index 5 is out of bounds for axis 0 with size 4