Closed blackredscarf closed 6 years ago
why this was closed? I got same results. Is it expected behavior?
I got similar results, but it shouldn't be. Because, if the robot reaches the goal, it will get a reward of 100 I think. What I'm curious about is that it seems the policy is not stable: sometimes the validation results remain high, sometimes remain around -99, sometimes remain around 0 ...
I used the code to train MountainCarContinuous-v0 directly. But reward converges to 0.