araffin / rl-baselines-zoo

A collection of 100+ pre-trained RL agents using Stable Baselines, training and hyperparameter optimization included.
https://stable-baselines.readthedocs.io/
MIT License
1.12k stars 208 forks source link

How to use Optuna for custom environments #29

Closed patterntrade closed 3 years ago

patterntrade commented 5 years ago

This isn't a bug or anything like that, but I wonder if anyone could point me in the right direction.

One can do this:

python -m train.py --algo ppo2 --env MountainCar-v0 -n 50000 -optimize --n-trials 1000 --n-jobs 2 --sampler random --pruner median

But when you've created a custom environment...

env=DummyVecEnv([lambda: RunEnv(...)])
model= A2C(CnnPolicy,env).learn(total_timesteps)

... how can I enter the Optuna parameters - or is it even possible?

Of course I can create a custom Gym environment, but that's a bit clunky.

Thankful for feedback

Kind regards

araffin commented 5 years ago

Hello, Not sure to understand your question, you want to optimize the hyperparameters of an environment not of an algorithm?

patterntrade commented 5 years ago

Hi! As always I'm impressed by how quickly you respond! :-)

My question was primarily concerned with optimising the hyperparameters of the algorithm when I run it not from command line like this:

python -m train.py --algo ppo2 --env MountainCar-v0 -n 50000 -optimize --n-trials 1000 --n-jobs 2 --sampler random --pruner median

but when I have defined a custom environment like this, and want to apply Optima on it - where (inside which parenthesis) and how do I pass the parameters?

env=DummyVecEnv([lambda: RunEnv(...)])
model= A2C(CnnPolicy,env).learn(total_timesteps)

But now that you mention it, whether I use either method, it would be great if it was possible to involve Optuna in optimising the parameters I use inside the environment as well. Say you'd want to find the optimum size of the Atari matrix to deliver at step() and reset(), or how many frames I could skip before learning suffers.

Thanks again for your efforts with Stable B and Zoo, I'm really enjoying it.

Kind regards

araffin commented 5 years ago

I wouldn't integrate optuna for optimizing parameters of a custom env in the rl zoo. The main reason is that, to make things reproducible, you usually want the env to be fixed, so you have a fair comparison between algorithms.

However, you can create your custom version of the rl zoo, where you will need to replace the gym.make by object creation of your custom env, so you can pass keyword arguments.

ruifeng96150 commented 4 years ago

I have a idea to meet this kind of requirements. need to modify train.py and make_env function at unils.py.

Hope this helps.

in train.py

def run_optimize(args):
    # Going through custom gym packages to let them register in the global registory
    for env_module in args.gym_packages:
        importlib.import_module(env_module)
     ....
     #and need to transfer the env_instance param at some code

def run_by_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('--env', type=str, nargs='+', default=["CartPole-v1"], help='environment ID(s)')
    ...
    run_optimize(args)

def ToCreateMyNewEnv():
    pass
    return env

def run_custom_env():
    args = dict_to_object({
        "algo": "ppo2",
        "env": ['MyEnv-v0'],
        "env_class": ToCreateMyNewEnv,
        "n_timesteps": 660 * 17,
        "optimize_hyperparameters": True,
        "n_trials": 10,
        "n_jobs": 4,
        "sampler": "tpe",
        "pruner": "median",
        "seed": 0,
        "gym_packages": [],
        "trained_agent": '',
        "tensorboard_log": '',
        "verbose": 1,
        "log_interval": 1,
        "log_folder": 'temp',
    })
    run_optimize(args)

if __name__ == "__main__":
     run_by_args()
    #run_custom_env()

and utils.py

    def _init():
        set_global_seeds(seed + rank)
        if env_instance:
            # must create a new instance
            env = env_instance()
        else:
            env = gym.make(env_id)

and need add config at hyperparams/some_agent.yml

ruifeng96150 commented 4 years ago

and what's more, why not move those codes (train and optimize) into stable_baselines? I think that zoo is a little big and sometimes, we only need the optimize function, but we must download the whole project. For zoo, I think one functions are enough, play the game with trained agents.Thanks

araffin commented 4 years ago

need to modify train.py and make_env function at unils.py.

As I already replied before, I would not integrate that in the zoo, however, you can create your custom version of the zoo for that particular need.

why not move those codes (train and optimize) into stable_baselines?

I would not because this is not the idea of the library, however, having a separate repo for training/optimizing without any trained agent, then maybe.

, but we must download the whole project

That's true, I did not make my mind on this yet. Also, nothing prevents you from copy pasting quickly the file (even though that's a little bit repetitive)

ruifeng96150 commented 4 years ago

ok, what's you said also make sense. I have ready created my own projects just using some of your code and runs well. But when I see a lot of people asking these kinds of questions, I want to give some tips. Thank you and your project. ;)

josiahcoad commented 4 years ago

@ruifeng96150 Have you had this working recently? I also want to optimize hparams on my custom gym env. I'm having some trouble though. I don't understand your code in context of the current version of zoo. Maybe it is outdated?

josiahcoad commented 4 years ago

I spent a while trying to get zoo to work with my custom env. It kept freezing during the training. Finally, I found this (non-zoo) simple approach. This worked for me with tf 1.15.0 and baselines 2.10.0

# hide all deprecation warnings from tensorflow
import tensorflow as tf
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)

import optuna
import gym
import numpy as np

from stable_baselines import PPO2
from stable_baselines.common.evaluation import evaluate_policy
from stable_baselines.common.cmd_util import make_vec_env

# https://colab.research.google.com/github/araffin/rl-tutorial-jnrr19/blob/master/5_custom_gym_env.ipynb
from custom_env import GoLeftEnv

def optimize_ppo2(trial):
    """ Learning hyperparamters we want to optimise"""
    return {
        'n_steps': int(trial.suggest_loguniform('n_steps', 16, 2048)),
        'gamma': trial.suggest_loguniform('gamma', 0.9, 0.9999),
        'learning_rate': trial.suggest_loguniform('learning_rate', 1e-5, 1.),
        'ent_coef': trial.suggest_loguniform('ent_coef', 1e-8, 1e-1),
        'cliprange': trial.suggest_uniform('cliprange', 0.1, 0.4),
        'noptepochs': int(trial.suggest_loguniform('noptepochs', 1, 48)),
        'lam': trial.suggest_uniform('lam', 0.8, 1.)
    }

def optimize_agent(trial):
    """ Train the model and optimize
        Optuna maximises the negative log likelihood, so we
        need to negate the reward here
    """
    model_params = optimize_ppo2(trial)
    env = make_vec_env(lambda: GoLeftEnv(), n_envs=16, seed=0)
    model = PPO2('MlpPolicy', env, verbose=0, nminibatches=1, **model_params)
    model.learn(10000)
    mean_reward, _ = evaluate_policy(model, GoLeftEnv(), n_eval_episodes=10)

    return -1 * mean_reward

if __name__ == '__main__':
    study = optuna.create_study()
    try:
        study.optimize(optimize_agent, n_trials=100, n_jobs=4)
    except KeyboardInterrupt:
        print('Interrupted by keyboard.')
araffin commented 4 years ago

Have you considered registering your env instead?

Cf doc: https://github.com/openai/gym/wiki/Environments

gkourogiorgas commented 3 years ago

@josiahcoad thanks for the code. I do use it now as template. If I wanted to try out different architecture for pi and vf nn of the policy how would I put it in the params function? I do it manually using:

class CustomPolicy(FeedForwardPolicy):
    def __init__(self, *args, **kwargs):
        super(CustomPolicy, self).__init__(*args, **kwargs,
                                           net_arch=[dict(pi=[512, 256, 128, 64, 32],
                                                          vf=[512, 256, 128, 64, 32])],
                                           feature_extraction="mlp")

But I would like to use optuna to try out different architectures

araffin commented 3 years ago

But I would like to use optuna to try out different architectures

https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/utils/hyperparams_opt.py#L53

and

https://stable-baselines3.readthedocs.io/en/master/guide/custom_policy.html

PS: Please use SB3 now (https://github.com/DLR-RM/stable-baselines3) as SB2 is no longer actively developed.

araffin commented 3 years ago

I will close this issue as it is now documented in the RL Zoo of Stable Baselines3: https://github.com/DLR-RM/rl-baselines3-zoo#custom-environment