araffin / rl-baselines-zoo

A collection of 100+ pre-trained RL agents using Stable Baselines, training and hyperparameter optimization included.
https://stable-baselines.readthedocs.io/
MIT License
1.12k stars 206 forks source link

What is the best initial values for the parameters in the .yml? [question] #103

Closed toksis closed 3 years ago

toksis commented 3 years ago

Hello,

I am using ACKTR and using the optima to have the obtain the optimum values. What should be the best initial values in the acktr.yml. Right now I just copy paste some of the values there.

# not tuned
forex-v0:
  # env_wrapper: utils.wrappers.MyForexEnv
  n_envs: 8
  n_steps: 128
  n_timesteps: !!float 1e6
  policy: 'MlpPolicy'
  gamma: 0.99
  lr_schedule: 'constant'
araffin commented 3 years ago

Hello,

I'm not sure to understand your question, but it is probably answered in our guide

toksis commented 3 years ago

I mean what are the initial values of the hyperparameters when you do an -optimize on train.py, you will look to .yml to get the initial values of the hyperparameters. What are the initial values you put in the yml?

        data_frame = hyperparam_optimization(args.algo, create_model, create_env, n_trials=args.n_trials,
                                             n_timesteps=n_timesteps, hyperparams=hyperparams,
                                             n_jobs=args.n_jobs, seed=args.seed,
                                             sampler_method=args.sampler, pruner_method=args.pruner,
                                             verbose=args.verbose)

like this values in acktr.yml , why did you put value 0.0 in ent_coef. I also noticed that there are #tuned values already in acktr.yml for a specific environment.

Pendulum-v0:
  n_envs: 4
  n_timesteps: !!float 2e6
  policy: 'MlpPolicy'
  ent_coef: 0.0
  gamma: 0.99
  n_steps: 16
  learning_rate: 0.06
  lr_schedule: 'constant'
araffin commented 3 years ago

What are the initial values you put in the yml?

Several answers to that. First, most of the hyperparameters comes from the paper. Then, on enviroments where they don't work, I either quickly tried to tune them manually or ran some automatic hyperparameter tuning using Optuna (cf README). For more details on the methodology, you can check the "Hyperparameter Optimization" section of the gSDE paper.

When I say "tune manually", I mean that for instance, you usually don't need an entropy bonus on continuous control, hence the ent_coef set to zero.

toksis commented 3 years ago

It is clear now. This gives me direction. Thank you.