araffin / rl-baselines-zoo

A collection of 100+ pre-trained RL agents using Stable Baselines, training and hyperparameter optimization included.
https://stable-baselines.readthedocs.io/
MIT License
1.12k stars 206 forks source link

How is the trial value calculated for RL jobs? #104

Closed blurLake closed 3 years ago

blurLake commented 3 years ago

Hi, I have a question about how is the trial value (or perhaps the validation score) calculated, e.g., Trial 323 finished with value: 57.2591552734375 and parameters: {'gamma': 0.05, 'lr': 0.0002252244861681433, 'learning_starts': 100, 'batch_size': 100, 'buffer_size': 10000, 'train_freq': 1, 'tau': 0.1, 'policy_delay': 2, 'noise_type': 'ornstein-uhlenbeck', 'noise_std': 0.28100749015027093, 'net_arch': 'medium'}. Especially, how is it calculated for an episodic RL job? Is there any document elaborating this?

Thank you!

araffin commented 3 years ago

Hello, Best is to take a look at the code: https://github.com/araffin/rl-baselines-zoo/blob/master/utils/callbacks.py#L29

It is the mean episodic reward on n evaluation episodes.

You can find a simple example in the optuna repo: https://github.com/optuna/optuna/blob/master/examples/rl/sb3_simple.py