Closed patterntrade closed 3 years ago
Hello, Not sure to understand your question, you want to optimize the hyperparameters of an environment not of an algorithm?
Hi! As always I'm impressed by how quickly you respond! :-)
My question was primarily concerned with optimising the hyperparameters of the algorithm when I run it not from command line like this:
python -m train.py --algo ppo2 --env MountainCar-v0 -n 50000 -optimize --n-trials 1000 --n-jobs 2 --sampler random --pruner median
but when I have defined a custom environment like this, and want to apply Optima on it - where (inside which parenthesis) and how do I pass the parameters?
env=DummyVecEnv([lambda: RunEnv(...)])
model= A2C(CnnPolicy,env).learn(total_timesteps)
But now that you mention it, whether I use either method, it would be great if it was possible to involve Optuna in optimising the parameters I use inside the environment as well. Say you'd want to find the optimum size of the Atari matrix to deliver at step() and reset(), or how many frames I could skip before learning suffers.
Thanks again for your efforts with Stable B and Zoo, I'm really enjoying it.
Kind regards
I wouldn't integrate optuna for optimizing parameters of a custom env in the rl zoo. The main reason is that, to make things reproducible, you usually want the env to be fixed, so you have a fair comparison between algorithms.
However, you can create your custom version of the rl zoo, where you will need to replace the gym.make
by object creation of your custom env, so you can pass keyword arguments.
I have a idea to meet this kind of requirements. need to modify train.py and make_env function at unils.py.
Hope this helps.
in train.py
def run_optimize(args):
# Going through custom gym packages to let them register in the global registory
for env_module in args.gym_packages:
importlib.import_module(env_module)
....
#and need to transfer the env_instance param at some code
def run_by_args():
parser = argparse.ArgumentParser()
parser.add_argument('--env', type=str, nargs='+', default=["CartPole-v1"], help='environment ID(s)')
...
run_optimize(args)
def ToCreateMyNewEnv():
pass
return env
def run_custom_env():
args = dict_to_object({
"algo": "ppo2",
"env": ['MyEnv-v0'],
"env_class": ToCreateMyNewEnv,
"n_timesteps": 660 * 17,
"optimize_hyperparameters": True,
"n_trials": 10,
"n_jobs": 4,
"sampler": "tpe",
"pruner": "median",
"seed": 0,
"gym_packages": [],
"trained_agent": '',
"tensorboard_log": '',
"verbose": 1,
"log_interval": 1,
"log_folder": 'temp',
})
run_optimize(args)
if __name__ == "__main__":
run_by_args()
#run_custom_env()
and utils.py
def _init():
set_global_seeds(seed + rank)
if env_instance:
# must create a new instance
env = env_instance()
else:
env = gym.make(env_id)
and need add config at hyperparams/some_agent.yml
and what's more, why not move those codes (train and optimize) into stable_baselines? I think that zoo is a little big and sometimes, we only need the optimize function, but we must download the whole project. For zoo, I think one functions are enough, play the game with trained agents.Thanks
need to modify train.py and make_env function at unils.py.
As I already replied before, I would not integrate that in the zoo, however, you can create your custom version of the zoo for that particular need.
why not move those codes (train and optimize) into stable_baselines?
I would not because this is not the idea of the library, however, having a separate repo for training/optimizing without any trained agent, then maybe.
, but we must download the whole project
That's true, I did not make my mind on this yet. Also, nothing prevents you from copy pasting quickly the file (even though that's a little bit repetitive)
ok, what's you said also make sense. I have ready created my own projects just using some of your code and runs well. But when I see a lot of people asking these kinds of questions, I want to give some tips. Thank you and your project. ;)
@ruifeng96150 Have you had this working recently? I also want to optimize hparams on my custom gym env. I'm having some trouble though. I don't understand your code in context of the current version of zoo. Maybe it is outdated?
I spent a while trying to get zoo to work with my custom env. It kept freezing during the training. Finally, I found this (non-zoo) simple approach. This worked for me with tf 1.15.0 and baselines 2.10.0
# hide all deprecation warnings from tensorflow
import tensorflow as tf
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
import optuna
import gym
import numpy as np
from stable_baselines import PPO2
from stable_baselines.common.evaluation import evaluate_policy
from stable_baselines.common.cmd_util import make_vec_env
# https://colab.research.google.com/github/araffin/rl-tutorial-jnrr19/blob/master/5_custom_gym_env.ipynb
from custom_env import GoLeftEnv
def optimize_ppo2(trial):
""" Learning hyperparamters we want to optimise"""
return {
'n_steps': int(trial.suggest_loguniform('n_steps', 16, 2048)),
'gamma': trial.suggest_loguniform('gamma', 0.9, 0.9999),
'learning_rate': trial.suggest_loguniform('learning_rate', 1e-5, 1.),
'ent_coef': trial.suggest_loguniform('ent_coef', 1e-8, 1e-1),
'cliprange': trial.suggest_uniform('cliprange', 0.1, 0.4),
'noptepochs': int(trial.suggest_loguniform('noptepochs', 1, 48)),
'lam': trial.suggest_uniform('lam', 0.8, 1.)
}
def optimize_agent(trial):
""" Train the model and optimize
Optuna maximises the negative log likelihood, so we
need to negate the reward here
"""
model_params = optimize_ppo2(trial)
env = make_vec_env(lambda: GoLeftEnv(), n_envs=16, seed=0)
model = PPO2('MlpPolicy', env, verbose=0, nminibatches=1, **model_params)
model.learn(10000)
mean_reward, _ = evaluate_policy(model, GoLeftEnv(), n_eval_episodes=10)
return -1 * mean_reward
if __name__ == '__main__':
study = optuna.create_study()
try:
study.optimize(optimize_agent, n_trials=100, n_jobs=4)
except KeyboardInterrupt:
print('Interrupted by keyboard.')
Have you considered registering your env instead?
@josiahcoad thanks for the code. I do use it now as template. If I wanted to try out different architecture for pi and vf nn of the policy how would I put it in the params function? I do it manually using:
class CustomPolicy(FeedForwardPolicy):
def __init__(self, *args, **kwargs):
super(CustomPolicy, self).__init__(*args, **kwargs,
net_arch=[dict(pi=[512, 256, 128, 64, 32],
vf=[512, 256, 128, 64, 32])],
feature_extraction="mlp")
But I would like to use optuna to try out different architectures
But I would like to use optuna to try out different architectures
https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/utils/hyperparams_opt.py#L53
and
https://stable-baselines3.readthedocs.io/en/master/guide/custom_policy.html
PS: Please use SB3 now (https://github.com/DLR-RM/stable-baselines3) as SB2 is no longer actively developed.
I will close this issue as it is now documented in the RL Zoo of Stable Baselines3: https://github.com/DLR-RM/rl-baselines3-zoo#custom-environment
This isn't a bug or anything like that, but I wonder if anyone could point me in the right direction.
One can do this:
python -m train.py --algo ppo2 --env MountainCar-v0 -n 50000 -optimize --n-trials 1000 --n-jobs 2 --sampler random --pruner median
But when you've created a custom environment...
... how can I enter the Optuna parameters - or is it even possible?
Of course I can create a custom Gym environment, but that's a bit clunky.
Thankful for feedback
Kind regards