Closed Kautenja closed 6 years ago
new reward scheme to penalize a loss of life
Double Deep Q seems to perform a lot better in the non-deterministic environment using this new reward scheme. Closing issue for now, the reward mutation might be separable to a different class if it doesn't generalize across all games (because of how we are updating the counter for agent deaths)
Games with a weaker reward scheme seem to be performing somewhat poorly. Perhaps investigate the
info
return from step to evaluate if we can reward agents based on number of lives left. Games such as Breakout and SpaceInvaders provide no clear incentive to prevent death unless the agent learns to correlate each death with the eventual loss of the game (and thus future rewards). A possible case is that the agent predicts a higher future reward from dying (no penalty for death currently) as it will be able to collect more rewards in the next life in certain states.