[Question] Custom Eval Callback for train/optimize

kingjin94 commented 10 months ago

❓ Question

What is the suggested way for customizing the (Trial)EvalCallback in sb3_zoo? There are several parameters that I would like to adapt for my custom environment, esp. the setting of deterministic evaluation (at the moment defaults to false for non-atari/non-minigrid environments, which seems to be an un-documented hyperparameter of the tuned algorithms). Also it seems that resampling the SDE distribution seems to improve my trained agents evaluation performance and it would be great if such a change could be exposed. At the moment it seems to me that I would have to alter exp_manager, s.t. it can accept a reference to a (Trial)EvalCallback class that would expect an eval_env(, trial), best_model_path, log_path, n_eval_episodes, eval_freq as inputs to initalization.

Checklist

[X] I have checked that there is no similar issue in the repo
[X] I have read the SB3 documentation
[X] I have read the RL Zoo documentation
[X] If code there is, it is minimal and working
[X] If code there is, it is formatted using the markdown code blocks for both code and stack traces.

araffin commented 10 months ago

Hello,

What is the suggested way for customizing the (Trial)EvalCallback in sb3_zoo?

current recommended way is to fork the RL Zoo, especially if you want to do something really custom.

kingjin94 commented 10 months ago

For future reference: I found it was easier to monkey patch the evaluate_policy function for now, i.e., add a own evaluation method to a python configuration file passed via train's conf-file argument and overwrite evaluate_policy, s.a.,

def own_evaluate_policy(model, env, n_eval_episodes, deterministic, render, callback, ...):
    <individual code>

import stable_baselines3.common.callbacks
stable_baselines3.common.callbacks.evaluate_policy = own_evaluate_policy

DLR-RM / rl-baselines3-zoo

[Question] Custom Eval Callback for train/optimize #434

❓ Question

Checklist