Closed speedcell4 closed 5 years ago
Hey @speedcell4 :slightly_smiling_face:
Usually it does pretty well after the script finishes, so 50
epochs. In terms of actual episodes it's like 1k(see the graph below)
Here is the smoothed graph of rewards of 3 runs I just did:
The parameters are probably not perfectly tuned :wink: as you can see on the graph the green run didn't do as good as the other two. You can try playing around with the parameters and see how it effects the results. If you find the set of parameters that works more stable please share I'll be happy to see/update them :slightly_smiling_face:
As for the recommendations on tuning I'd start with lowering the learning rate as I set it aggressively high to have a quick feedback while testing, but with lower one should be more stable. I'd try tuning the entropy parameters which is our regularization/exploration term, also ICM parameters are worth playing around with. It would be also good idea to apply some hyperparameter tuning algorithm and see what it comes up with :thinking:
I tried your
run_mountain_car.py
, but the accumulated rewards do not change at all. Are there any hyper-parameters that I need to change? And how many episodes are needed in general?