DLR-RM / rl-baselines3-zoo

A training framework for Stable Baselines3 reinforcement learning agents, with hyperparameter optimization and pre-trained agents included.
https://rl-baselines3-zoo.readthedocs.io
MIT License
1.9k stars 496 forks source link

[Feature] Upstream hyperparameter eval script #151

Open jkterry1 opened 2 years ago

jkterry1 commented 2 years ago

Building off the jsons of the best hyperparameters saved in https://github.com/DLR-RM/rl-baselines3-zoo/pull/140, I have a script that takes each and runs trains it for 10 times (by default) to determine which hyperparmaters are actually the best. This seems like a generally useful feature to rl baselines zoo users.

Right now my code is this file https://github.com/jkterry1/rl-baselines3-zoo/blob/master/eval_hyperparameters.py , as called by this script on multiple GPUs https://github.com/jkterry1/rl-baselines3-zoo/blob/master/run_eval1.sh.

Would you like this upstreamed, and if so how could it be factored in a general way? The method for this is not obvious to me.

araffin commented 2 years ago

Hello, I've got mixed feeling about this. On one hand, running several times the same set of hyperparams is the correct way to go, on the other hand, as we cannot use pruner anymore and as the time per trials is multiplied by the number of seeds, this may blow up the time required to evaluate enough candidates...

Anyway, I would at least add a link to your implementation in the readme to document that ;)

so how could it be factored in a general way?

not really, we would need to create a separate loop when using this hyperparameter optimization strategy.

jkterry1 commented 2 years ago

To clarify, this is taking the 10 best hyperparameters and running them a bunch, not running each 1 time in training. That's much more computationally reasonable.

araffin commented 2 years ago

To clarify, this is taking the 10 best hyperparameters and running them a bunch, not running each 1 time in training. That's much more computationally reasonable.

Oh, I see, it's only a post-processing step? (what you used in https://github.com/DLR-RM/rl-baselines3-zoo/pull/155 I guess?)

In that case, that would be an interesting addition (you should avoid duplicating code though, as most of the pre-processing can probably be re-used from the exp_manager.py).

jkterry1 commented 2 years ago

"(what you used in #155 I guess?)" Yep

"you should avoid duplicating code though, as most of the pre-processing can probably be re-used from the exp_manager.py" When you get a chance could you elaborate on how this could be natively integrated into baselines3 in a reasonable way? I have absolutely no clue how this could be done, like I mentioned in the original posting.