DLR-RM / rl-baselines3-zoo

A training framework for Stable Baselines3 reinforcement learning agents, with hyperparameter optimization and pre-trained agents included.
https://rl-baselines3-zoo.readthedocs.io
MIT License
1.89k stars 494 forks source link

[Question] Custom Eval Callback for train/optimize #434

Closed kingjin94 closed 5 months ago

kingjin94 commented 5 months ago

❓ Question

What is the suggested way for customizing the (Trial)EvalCallback in sb3_zoo? There are several parameters that I would like to adapt for my custom environment, esp. the setting of deterministic evaluation (at the moment defaults to false for non-atari/non-minigrid environments, which seems to be an un-documented hyperparameter of the tuned algorithms). Also it seems that resampling the SDE distribution seems to improve my trained agents evaluation performance and it would be great if such a change could be exposed. At the moment it seems to me that I would have to alter exp_manager, s.t. it can accept a reference to a (Trial)EvalCallback class that would expect an eval_env(, trial), best_model_path, log_path, n_eval_episodes, eval_freq as inputs to initalization.

Checklist

araffin commented 5 months ago

Hello,

What is the suggested way for customizing the (Trial)EvalCallback in sb3_zoo?

current recommended way is to fork the RL Zoo, especially if you want to do something really custom.

kingjin94 commented 5 months ago

For future reference: I found it was easier to monkey patch the evaluate_policy function for now, i.e., add a own evaluation method to a python configuration file passed via train's conf-file argument and overwrite evaluate_policy, s.a.,

def own_evaluate_policy(model, env, n_eval_episodes, deterministic, render, callback, ...):
    <individual code>

import stable_baselines3.common.callbacks
stable_baselines3.common.callbacks.evaluate_policy = own_evaluate_policy