Closed maximecb closed 6 years ago
Won't this affect the training of actor-critic methods? Since the critic actually takes in the next state (which would normally be the observation of it colliding w the object) during training and uses it to better approximate Q?
I don't think it would, because it's an unrecoverable state, and the agent already knows the episode is done. One option would be to add a flag, and only enable this in manual control and policy visualization scripts. We could also just close the issue, since this isn't exactly a necessary feature.
I thought it might be funny (and somewhat useful) if, when the agent performs an action that ends the episode, a "GAME OVER" screen was displayed for the last observation. This won't affect training because there is nothing the agent can to do recover when it gets this last observation.
The screen would be useful because it could also display the cause of termination, ie: collision or driving off road. This screen should not be displayed, however, if the episode ends because
max_steps
is exceeded (time limit hit).I would like to display a transparent grey background (draw a transparent polygon in camera coordinates), with the "GAME OVER" text in large letters, and a subtitle that explains the cause of episode termination.