Kautenja / playing-mario-with-deep-reinforcement-learning

An implementation of (Double/Dueling) Deep-Q Learning to play Super Mario Bros.
MIT License
69 stars 12 forks source link

Reward Schemes #17

Closed Kautenja closed 6 years ago

Kautenja commented 6 years ago

Games with a weaker reward scheme seem to be performing somewhat poorly. Perhaps investigate the info return from step to evaluate if we can reward agents based on number of lives left. Games such as Breakout and SpaceInvaders provide no clear incentive to prevent death unless the agent learns to correlate each death with the eventual loss of the game (and thus future rewards). A possible case is that the agent predicts a higher future reward from dying (no penalty for death currently) as it will be able to collect more rewards in the next life in certain states.

Kautenja commented 6 years ago

new reward scheme to penalize a loss of life

Kautenja commented 6 years ago

Double Deep Q seems to perform a lot better in the non-deterministic environment using this new reward scheme. Closing issue for now, the reward mutation might be separable to a different class if it doesn't generalize across all games (because of how we are updating the counter for agent deaths)