Zeta36 / Asynchronous-Methods-for-Deep-Reinforcement-Learning

Using a paper from Google DeepMind I've developed a new version of the DQN using threads exploration instead of memory replay as explain in here: http://arxiv.org/pdf/1602.01783v1.pdf I used the one-step-Q-learning pseudocode, and now we can train the Pong game in less than 20 hours and without any GPU or network distribution.
84 stars 31 forks source link

exception for thread #5

Open mhamzaaziz1 opened 6 years ago

mhamzaaziz1 commented 6 years ago

when i start the thread it show this exception

Exception in thread Thread-31: Traceback (most recent call last): File "/home/anderson/.conda/envs/tensorflow/lib/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/home/anderson/.conda/envs/tensorflow/lib/python3.6/threading.py", line 864, in run self._target(*self._args, **self._kwargs) File "", line 141, in actorLearner x_t1_col, r_t, terminal, info = env.step(KEYMAP[GAME][action_index]) File "/home/anderson/Videos/gym/gym/wrappers/time_limit.py", line 31, in step observation, reward, done, info = self.env.step(action) File "/home/anderson/Videos/gym/gym/envs/atari/atari_env.py", line 68, in step action = self._action_set[a] IndexError: index 5 is out of bounds for axis 0 with size 4