Closed PierreExeter closed 4 years ago
Hello,
This is normal. If you do hyperparameter tuning, you should set policy='MlpPolicy'
otherwise you will get the mentioned error, as the CustomSACPolicy
is already custom in term of number of layers, would be nice to change CustomSACPolicy
to MlpPolicy
but with policy_kwargs="dict(layers=[256,256])"
Ok thanks for your very quick reply.
Just a doubt, is it OK then to tune the hyperparameters with policy='MlpPolicy'
and then to train the model with CustomSACPolicy
? Does it not defeat the purpose of tuning in the first place? i.e. would hyperparameters optimised with one policy be also optimal for another policy?
Does it not defeat the purpose of tuning in the first place? i.e. would hyperparameters optimised with one policy be also optimal for another policy?
If in your hyperparameter optimization you allow architecture search: https://github.com/araffin/rl-baselines-zoo/blob/645ea177b9e9253223a9733d285fe666de418e6f/utils/hyperparams_opt.py#L245-L250
then it does make sense to have policy='MlpPolicy'
.
However, if you fix the architecture (by commenting the lines above), then you can use CustomSACPolicy
(or in a equivalent way, MlpPolicy
+ policy_kwargs="dict(layers=[256,256])"
)
Ok thanks a lot for your help, I'm closing this issue now.
Describe the bug Running hyperparameter tuning with SAC and CustomSACPolicy returns
Note, hyperparameter tuning is working fine with MlpPolicy and normal training is working fine with CustomSACPolicy. The issue seems to be coming from Tensorflow.
Code example
After a recent git clone and using the defaults hyperparameters in hyperparameters/sac.yml:
Full traceback:
System Info