DLR-RM / rl-baselines3-zoo

A training framework for Stable Baselines3 reinforcement learning agents, with hyperparameter optimization and pre-trained agents included.
https://rl-baselines3-zoo.readthedocs.io
MIT License
1.97k stars 505 forks source link

Understanding BipedalWalkerHardcore-v3 results #109

Closed julian24bas closed 3 years ago

julian24bas commented 3 years ago
  1. Are the hyperparameters for BipedalWalkerHardcore-v3 and BipedalWalker-v3 tuned for PPO? You provide a benchmark for both but the YAML file does not have a #tuned tag. From my understanding I would get similar results for different seeds when using tuned parameters but the performance varies for different runs for the hardcore environment.

  2. What does

NOTE: this is not a quantitative benchmark as it corresponds to only one run (cf issue #38). This benchmark is meant to check algorithm (maximal) performance, find potential bugs and also allow users to have access to pretrained agents.

mean for the BipedalWalker with PPO? Did you use the best model you got from all runs or is it reproducable using the hyperparamters?

Kind regards (:

araffin commented 3 years ago

Are the hyperparameters for BipedalWalkerHardcore-v3 and BipedalWalker-v3 tuned for PPO?

Looking at the results, I would say both are "almost" tuned. They give good results but it could potentially be better (looking at the results from other algorithms).

From my understanding I would get similar results for different seeds when using tuned parameters but the performance varies for different runs for the hardcore environment.

the "tuned" tag is an informal tag I use when some hyperparameter set works well most of the time for an algorithm (unfortunately, it is hard to have one that works 100% of the time).

mean for the BipedalWalker with PPO? Did you use the best model you got from all runs or is it reproducable using the hyperparamters?

this is mostly a warning to address some questions I had in the past. Here, I just used one run of PPO, but in the future, if there is a better run, it will replace the current result yes. In theory, you can reproduce the results using the given hyperparameters but you may need several runs.

julian24bas commented 3 years ago

Thanks for the fast reply!