ghliu / pytorch-ddpg

Implementation of the Deep Deterministic Policy Gradient (DDPG) using PyTorch
Apache License 2.0
569 stars 157 forks source link

Anyone reproduced the MountainCarContinuous-v0 results? #10

Closed QiXuanWang closed 5 years ago

QiXuanWang commented 5 years ago

I tried with same setting and the final stable average reward is close to 0, instead of 100. Is anyone tried this implementation doing getting expected values?

QiXuanWang commented 5 years ago

I do have some better results now after some rerun and tuning but still can't get the same results as author which is quite stable. My run begins to diverge after being stable for a while

QiXuanWang commented 5 years ago

Got consistent good results with ou_sigma=0.52, validate_episodes=200

AlexZhaoZt commented 4 years ago

I found that if you set ou_sigma too large, the reward will converge to -100. So, I think a good idea is to start from ou_sigma = 0.50, increase it, and if you see rewards converge to -100, decrease it, repeat (like doing a binary search). Using this method, I was able to find ou_sigma that gives stable result for any given seed.