Kaixhin / Rainbow

Rainbow: Combining Improvements in Deep Reinforcement Learning
MIT License
1.59k stars 284 forks source link

About Episodic Life at Test Phase #36

Closed zmonoid closed 5 years ago

zmonoid commented 5 years ago

In quite a lot implementations and papers, the scores reported are actually when the game is over instead of loss of life (episodic life is only used during training).

You may consider remove episodic life for testing environment to match the score reported.

Kaixhin commented 5 years ago

I already do this. In test.py, a new environment is created and env.eval() is called. env.eval() sets self.training = False, which is a flag that when set true activates the code path for terminating on loss of life.

zmonoid commented 5 years ago

@Kaixhin Thanks for your information.