Closed blurLake closed 3 years ago
Hello, Best is to take a look at the code: https://github.com/araffin/rl-baselines-zoo/blob/master/utils/callbacks.py#L29
It is the mean episodic reward on n
evaluation episodes.
You can find a simple example in the optuna repo: https://github.com/optuna/optuna/blob/master/examples/rl/sb3_simple.py
Hi, I have a question about how is the trial value (or perhaps the validation score) calculated, e.g.,
Trial 323 finished with value: 57.2591552734375 and parameters: {'gamma': 0.05, 'lr': 0.0002252244861681433, 'learning_starts': 100, 'batch_size': 100, 'buffer_size': 10000, 'train_freq': 1, 'tau': 0.1, 'policy_delay': 2, 'noise_type': 'ornstein-uhlenbeck', 'noise_std': 0.28100749015027093, 'net_arch': 'medium'}.
Especially, how is it calculated for an episodic RL job? Is there any document elaborating this?Thank you!