adik993 / ppo-pytorch

Proximal Policy Optimization(PPO) with Intrinsic Curiosity Module(ICM)
132 stars 27 forks source link

How many episodes are needed to solve MountainCar-v0 with PPO + curiosity? #3

Closed speedcell4 closed 5 years ago

speedcell4 commented 5 years ago

I tried your run_mountain_car.py, but the accumulated rewards do not change at all. Are there any hyper-parameters that I need to change? And how many episodes are needed in general?

adik993 commented 5 years ago

Hey @speedcell4 :slightly_smiling_face: Usually it does pretty well after the script finishes, so 50 epochs. In terms of actual episodes it's like 1k(see the graph below)

Here is the smoothed graph of rewards of 3 runs I just did: image

The parameters are probably not perfectly tuned :wink: as you can see on the graph the green run didn't do as good as the other two. You can try playing around with the parameters and see how it effects the results. If you find the set of parameters that works more stable please share I'll be happy to see/update them :slightly_smiling_face:

As for the recommendations on tuning I'd start with lowering the learning rate as I set it aggressively high to have a quick feedback while testing, but with lower one should be more stable. I'd try tuning the entropy parameters which is our regularization/exploration term, also ICM parameters are worth playing around with. It would be also good idea to apply some hyperparameter tuning algorithm and see what it comes up with :thinking: