devsisters / DQN-tensorflow

Tensorflow implementation of Human-Level Control through Deep Reinforcement Learning
MIT License
2.48k stars 763 forks source link

about starting a new game and History #59

Open Richardxxxxxxx opened 6 years ago

Richardxxxxxxx commented 6 years ago

in dqn/agent.py line 59

  if terminal:
    screen, reward, action, terminal = self.env.new_random_game()

when starting a new game due to a terminal state.

why we don't need to reset the self.history?

because it would affect the next iteration.

  # 1. predict
  action = self.predict(self.history.get())
  # 2. act
  screen, reward, terminal = self.env.act(action, is_training=True)
  # 3. observe
  self.observe(screen, reward, action, terminal)

the predicted action for self.history.get() is not depending on the current game screens, it will predict action for the previous game screen, which is ended, instead.

Do I miss anything?

Thank you very much.

hipoglucido commented 6 years ago

Yeah, it would affect the next iteration, but it won't do any harm in most of the cases. In many RL environments the concept of episode/game is abstracted away from the agent and all it sees is a continuous flow of millions of frames.