Negative Reward for terminal flag

Kautenja / playing-mario-with-deep-reinforcement-learning

An implementation of (Double/Dueling) Deep-Q Learning to play Super Mario Bros.

MIT License

68 stars 12 forks source link

Negative Reward for terminal flag #18

Closed Kautenja closed 6 years ago

Kautenja commented 6 years ago

the current behavior is to penalize the end of an episode to encourage the agent to prolong episodes (games). In cases where a game is never "solved" -- i.e. it can be played indefinitely -- this surely makes sense. However, in the case of Pong, the game is solved when either adversary achieves 20 points. if the agents wins, it is currently penalized. Should situations like this be addressed? Or, does this not matter too much in the long run?

Kautenja commented 6 years ago

Gym has Atari specific wrappers for this sort of behavior so it doesnt need to be built into the agents https://github.com/openai/baselines/blob/master/baselines/common/atari_wrappers.py

Kautenja commented 6 years ago

wrappers are implemented, currently the agent is not penalized for termination of an episode. Instead, the agent receives penalty for the loss of a life. This should generalize across the full domain of games. Closing issue for now