Closed jarlva closed 4 years ago
hello,
You mean optimizing the model architecture?
yes, it is possible, you need to change the sampler script a bit and pass a policy_kwargs=dict(net_arch=[64,64])
(or layers=
for SAC/DQN...) to the constructor (cf doc).
Thanks Antonin,
Yes, optimizing the model architecture (tensors, layers, etc..) I'm new to SB and tried some things ( https://stable-baselines.readthedocs.io/en/master/guide/custom_policy.html). Yet, it's not clear how exactly to tune the model (via optuna I assume). Would it be possible to get a simple example (like cartpole)?
Much appreciated! Jake
is this what you want?
import numpy as np
from stable_baselines.common.vec_env import SubprocVecEnv
from stable_baselines import PPO2
from stable_baselines.common.policies import MlpLnLstmPolicy
import optuna
n_cpu = 4
def optimize_ppo2(trial):
""" Learning hyperparamters we want to optimise"""
return {
'n_steps': int(trial.suggest_loguniform('n_steps', 16, 2048)),
'gamma': trial.suggest_loguniform('gamma', 0.9, 0.9999),
'learning_rate': trial.suggest_loguniform('learning_rate', 1e-5, 1.),
'ent_coef': trial.suggest_loguniform('ent_coef', 1e-8, 1e-1),
'cliprange': trial.suggest_uniform('cliprange', 0.1, 0.4),
'noptepochs': int(trial.suggest_loguniform('noptepochs', 1, 48)),
'lam': trial.suggest_uniform('lam', 0.8, 1.)
}
def optimize_agent(trial):
""" Train the model and optimise
Optuna maximises the negative log likelihood, so we
need to negate the reward here
"""
model_params = optimize_ppo2(trial)
env = SubprocVecEnv([lambda: gym.make('CartPole-v1') for i in range(n_cpu)])
model = PPO2(MlpLnLstmPolicy, env, verbose=0, nminibatches=1, **model_params)
model.learn(10000)
rewards = []
n_episodes, reward_sum = 0, 0.0
obs = env.reset()
while n_episodes < 4:
action, _ = model.predict(obs)
obs, reward, done, _ = env.step(action)
reward_sum += reward
if done:
rewards.append(reward_sum)
reward_sum = 0.0
n_episodes += 1
obs = env.reset()
last_reward = np.mean(rewards)
trial.report(-1 * last_reward)
return -1 * last_reward
if __name__ == '__main__':
study = optuna.create_study(study_name='cartpol_optuna', storage='sqlite:///params.db', load_if_exists=True)
study.optimize(optimize_agent, n_trials=1000, n_jobs=1)
Thanks for the script Eunomia! That has been very helpful!
Is there a place to define and tune the tensorflow model layers/tensors? For example, in Keras the model is defined by:
model = Sequential() ; model.add(Dense(32, input_dim=784)) model.add(Activation('relu'))
There is something a bit less simple in tensorflow. Now, optimizing the model (tensors/layers and activation) to a specific problem can yield remarkable results/speed-up. To that end, Google came with up with adanet AutoML - a way to automatically find/tune the best tensorflow model (not sure how to apply it in RL). Is there a way to tune model's tensors/layers/activation (maybe by modifying the script above) via optuna (or maybe adanet)?
@jheffez
The code you are looking for (and that @eunomiadev wrote) is here.
Is there a place to define and tune the tensorflow model layers/tensors?
Please read the documentation for that (especially "custom policy" part). A quick example:
model = PPO2('MlpPolicy', 'CartPole-v1', policy_kwargs=dict(net_arch=[256, 256]))
with optuna:
def optimize_ppo2(trial):
""" Learning hyperparamters we want to optimise"""
net_arch = trial.suggest_categorical('net_arch', ['small', 'medium'])
net_arch = {
'small': [dict(pi=[64, 64], vf=[64, 64])],
'medium': [dict(pi=[256, 256], vf=[256, 256])],
}[net_arch]
return {
'policy_kwargs': dict(net_arch=net_arch),
}
I also recommend to read optuna documentation, you should find an answer to your questions ;)
Thanks again! I'll check it out.
I understand that there is a way to tune hyperparameter. Is there a way to tune the actual model (number of layers and tensors)? If not, is it possible to integrate something like adanet?