Closed danielstankw closed 2 years ago
Hello,
Why are you not using an EvalCallback
? (recommended way, see doc, also included in the RL Zoo https://github.com/DLR-RM/rl-baselines3-zoo)
Otherwise, if you cannot have an evaluation env, you can also retrieve the reward via the logger (rollout/ep_rew_mean
key)
Will try it out, thx
I want to reopen this issue
env = SubprocVecEnv([make_robosuite_env(env_id, env_options, i, seed_val) for i in range(num_proc)])
eval_callback = EvalCallback(env,
best_model_save_path=log_dir,
log_path=log_dir,
eval_freq=3,
deterministic=False,
render=False)
policy_kwargs = dict(activation_fn=torch.nn.LeakyReLU, net_arch=[32, 32])
model = PPO('MlpPolicy', env, verbose=1, policy_kwargs=policy_kwargs, n_steps=int(n_steps/num_proc),
tensorboard_log="./learning_log/ppo_tensorboard/", seed=4)
model.learn(total_timesteps=10000, tb_log_name="learning", callback=eval_callback, reset_num_timesteps=True)
I tried the following and got an error assert eval_env.num_envs == 1, "You must pass only one environment for evaluation"
I tried the following and got an error assert eval_env.num_envs == 1, "You must pass only one environment for evaluation"
Please upgrade SB3 version (see issue template, you need to give your config along your issue).
Ok gotcha,
Thank you very much,
Do you know if the custom callback given in examples SaveOnBestTrainingRewardCallback
also facilitates multiple envs?
also facilitates multiple envs?
it should work but please don't use it, it is mainly meant as a demo on what you can do with callbacks.
Better is to use CheckpointCallback
if you cannot evaluate at the same time of training.
I think I will update the doc.
Thanks, I would like to use model that gave me highest reward so why do you say its not good to use the SaveOnBestTrainingRewardCallback? CheckpointCallback doesnt give me the functionality I want.
I would like to use model that gave me highest reward
SaveOnBestTrainingRewardCallback
only gives you information about a proxy, mean episodic return for the training agent over n training episodes, but the agent is changing between each episode, the true performance at time t can only be known by evaluating it on a separate env for multiple episodes (that's what you can do with EvalCallback
or do as a post-processing step with CheckpointCallback
).
If you are doing continuous control, the controller that you are using at the end should be deterministic, which is not the one used for collecting data.
Thanks a lot for explanation!
Important Note: We do not do technical support, nor consulting and don't answer personal questions per email. Please post your question on the RL Discord, Reddit or Stack Overflow in that case.
Question
I want to use the SaveOnBestTrainingRewardCallback given in the stable baselines example but when using SubprocVecEnv with more than 1 env. The callback given in the example is not suitable for using with multiple env simultaneously. Did anyone modified it by any chance and would be willing to share a version that works for multiple env?
Additional context
...
Checklist