How many episodes are needed to solve MountainCar-v0 with PPO + curiosity?

Hey @speedcell4 :slightly_smiling_face: Usually it does pretty well after the script finishes, so 50 epochs. In terms of actual episodes it's like 1k(see the graph below)

Here is the smoothed graph of rewards of 3 runs I just did:

The parameters are probably not perfectly tuned :wink: as you can see on the graph the green run didn't do as good as the other two. You can try playing around with the parameters and see how it effects the results. If you find the set of parameters that works more stable please share I'll be happy to see/update them :slightly_smiling_face:

As for the recommendations on tuning I'd start with lowering the learning rate as I set it aggressively high to have a quick feedback while testing, but with lower one should be more stable. I'd try tuning the entropy parameters which is our regularization/exploration term, also ICM parameters are worth playing around with. It would be also good idea to apply some hyperparameter tuning algorithm and see what it comes up with :thinking:

adik993 / ppo-pytorch

How many episodes are needed to solve MountainCar-v0 with PPO + curiosity? #3