dennybritz / reinforcement-learning

Implementation of Reinforcement Learning Algorithms. Python, OpenAI Gym, Tensorflow. Exercises and Solutions to accompany Sutton's Book and David Silver's course.
http://www.wildml.com/2016/10/learning-reinforcement-learning/
MIT License
20.23k stars 6k forks source link

Clarification on DQN testing rewards on Atari games #235

Open willtop opened 3 years ago

willtop commented 3 years ago

I would like some help to clarify my confusion here:

  1. It seems that we have agreed on (confirmed by the DQN nature paper) that during training, whenever a life is lost (even if the agent still has more lives left), we would send a terminal tag to the DQN (which terminates the summation of Q values to be just including the reward at last step for the state leading towards the life lost). Is this correct?
  2. Also during training, after losing a life and sending the terminal tag, we would however still carry on with the agent using its remaining lives, rather than resetting the game. This is beneficial since the agent will go deeper into the game, and will see more advanced game states. Is this correct?
  3. When evaluating the agent playing the game, how are the reward results computed as they present in DQN papers? (3.1) Are they just summing up the rewards over a single life, or over all lives? They mentioned "episodic rewards" in testing, is the term "one episode" meaning just "one life"? (3.2) If that's indeed the case (the rewards reported in testing are just summed over a single life), then during testing, after a life is lost, do they reset the environment, or let the agent to use remaining lives for obtaining more "episodic rewards", as separate trials of results to be averaged or compared for max?

Thanks a lot if anyone can confirm on this!