Closed manuel-delverme closed 2 years ago
Good catch. It produces a zero reward. There used to be negative rewards but I found that this often made it hard to converge, because the agent tends to learn that doing nothing is better than acting, gets trapped in passive a local maxima.