araffin / rl-baselines-zoo

A collection of 100+ pre-trained RL agents using Stable Baselines, training and hyperparameter optimization included.
https://stable-baselines.readthedocs.io/
MIT License
1.12k stars 206 forks source link

Can't tune hyperparameters with CustomSACPolicy - multiple values for keyword argument 'layers' #78

Closed PierreExeter closed 4 years ago

PierreExeter commented 4 years ago

Describe the bug Running hyperparameter tuning with SAC and CustomSACPolicy returns

TypeError: __init__() got multiple values for keyword argument 'layers'.

Note, hyperparameter tuning is working fine with MlpPolicy and normal training is working fine with CustomSACPolicy. The issue seems to be coming from Tensorflow.

Code example

After a recent git clone and using the defaults hyperparameters in hyperparameters/sac.yml:

python train.py --algo sac --env HopperBulletEnv-v0 -n 50000 -optimize --n-trials 100 --n-jobs 1

Full traceback:

Traceback (most recent call last):
  File "***/bin/anaconda3/lib/python3.7/site-packages/optuna/study.py", line 648, in _run_trial
    result = func(trial)
  File "***/rl-baselines-zoo/utils/hyperparams_opt.py", line 88, in objective
    model = model_fn(**kwargs)
  File "train.py", line 373, in create_model
    verbose=0, **kwargs)
  File "***/bin/anaconda3/lib/python3.7/site-packages/stable_baselines/sac/sac.py", line 125, in __init__
    self.setup_model()
  File "***/bin/anaconda3/lib/python3.7/site-packages/stable_baselines/sac/sac.py", line 145, in setup_model
    **self.policy_kwargs)
  File "***/rl-baselines-zoo/utils/utils.py", line 71, in __init__
    feature_extraction="mlp")
TypeError: __init__() got multiple values for keyword argument 'layers'

System Info

araffin commented 4 years ago

Hello, This is normal. If you do hyperparameter tuning, you should set policy='MlpPolicy' otherwise you will get the mentioned error, as the CustomSACPolicy is already custom in term of number of layers, would be nice to change CustomSACPolicy to MlpPolicy but with policy_kwargs="dict(layers=[256,256])"

PierreExeter commented 4 years ago

Ok thanks for your very quick reply.

PierreExeter commented 4 years ago

Just a doubt, is it OK then to tune the hyperparameters with policy='MlpPolicy' and then to train the model with CustomSACPolicy? Does it not defeat the purpose of tuning in the first place? i.e. would hyperparameters optimised with one policy be also optimal for another policy?

araffin commented 4 years ago

Does it not defeat the purpose of tuning in the first place? i.e. would hyperparameters optimised with one policy be also optimal for another policy?

If in your hyperparameter optimization you allow architecture search: https://github.com/araffin/rl-baselines-zoo/blob/645ea177b9e9253223a9733d285fe666de418e6f/utils/hyperparams_opt.py#L245-L250

then it does make sense to have policy='MlpPolicy'. However, if you fix the architecture (by commenting the lines above), then you can use CustomSACPolicy (or in a equivalent way, MlpPolicy + policy_kwargs="dict(layers=[256,256])")

PierreExeter commented 4 years ago

Ok thanks a lot for your help, I'm closing this issue now.