Closed stevenhutt closed 8 years ago
In the code the rewards returned from the environment are clipped between -1 and 1. But I believe breakout will give higher rewards than 1 for bricks in rows nearer the top. What is the rationale for clipping?
In the code the rewards returned from the environment are clipped between -1 and 1. But I believe breakout will give higher rewards than 1 for bricks in rows nearer the top. What is the rationale for clipping?