ghliu / pytorch-ddpg

Implementation of the Deep Deterministic Policy Gradient (DDPG) using PyTorch
Apache License 2.0
569 stars 157 forks source link

Why MountainCarContinuous-v0 will converge to 0 ? #5

Closed blackredscarf closed 6 years ago

blackredscarf commented 6 years ago

I used the code to train MountainCarContinuous-v0 directly. But reward converges to 0.

reward

QiXuanWang commented 5 years ago

why this was closed? I got same results. Is it expected behavior?

pengzhi1998 commented 3 years ago

I got similar results, but it shouldn't be. Because, if the robot reaches the goal, it will get a reward of 100 I think. What I'm curious about is that it seems the policy is not stable: sometimes the validation results remain high, sometimes remain around -99, sometimes remain around 0 ...