PPO hyperparameters - Githubissues

kncrane commented 4 years ago

Hello,

Was wondering if the model weights made available at https://notanymike.github.io/rl/2017/12/18/Solving-CarRacing.html were produced using the PPO hyperparameters from the original Schulman paper or did you tune for the CarRacing environment using Stable Baselines hyperparameter optimization or similar? Was trying to reproduce.

Also out of interest, did you find using the RTX-2080 GPU gave you much of a speed up or could you not say? Wondering whether to run locally (free) or use Azure VM with GPU (using up credit), since I couldn't seem to reap the benefit when working with HalfCheetah (possibly because the env computation taking place on CPU was a bottleneck but unsure).

Great modified version of CarRacing by the way have enjoyed using it. Instead of cloning this full repo I followed the instructions for creating a custom gym environment and copied over the car_racing.py file. Works well!

Thanks

NotAnyMike commented 4 years ago

The weights are generated using this gym and Stable Baselines.
RTX 2080 is too much for this problem so it is not very relevant, the bottle neck is still the CPU and RAM in order to run several worlds at the same time.
Thanks, much appreciated 😄

Cheers

kncrane commented 4 years ago

Is that using the default hyperparameters for their implementation of PPO2?

Will give it a go and see what performance is like. If it's not great was thinking of running the hyperparameter optimization script offered by rl-baselines-zoo.

When I loaded in your weights and evaluated over 100 episodes mean reward was 895.30, which is pretty great right? Just to say under solve but compared to some of the scores in the lit that seemed good going.

Thanks

NotAnyMike commented 4 years ago

I don't remember the hyperparameters I used, probably the default ones. Last time I checked the SOTA was at 930 If I am not mistaken. If you can optimise the hp, it would be nice if you update us here

kncrane commented 4 years ago

Sure will do. although my guess is that even with optimisation I won't touch SOTA. It seems anything over 900 is using a different paradigm than end-to-end RL, such as the World Models paper.

NotAnyMike / gym

PPO hyperparameters #37