coreylynch / async-rl

Tensorflow + Keras + OpenAI Gym implementation of 1-step Q Learning from "Asynchronous Methods for Deep Reinforcement Learning"
MIT License
1.01k stars 174 forks source link

clipping #10

Closed stevenhutt closed 8 years ago

stevenhutt commented 8 years ago

In the code the rewards returned from the environment are clipped between -1 and 1. But I believe breakout will give higher rewards than 1 for bricks in rows nearer the top. What is the rationale for clipping?