How is the trial value calculated for RL jobs?

araffin / rl-baselines-zoo

A collection of 100+ pre-trained RL agents using Stable Baselines, training and hyperparameter optimization included.

MIT License

1.12k stars 206 forks source link

Hi, I have a question about how is the trial value (or perhaps the validation score) calculated, e.g., Trial 323 finished with value: 57.2591552734375 and parameters: {'gamma': 0.05, 'lr': 0.0002252244861681433, 'learning_starts': 100, 'batch_size': 100, 'buffer_size': 10000, 'train_freq': 1, 'tau': 0.1, 'policy_delay': 2, 'noise_type': 'ornstein-uhlenbeck', 'noise_std': 0.28100749015027093, 'net_arch': 'medium'}. Especially, how is it calculated for an episodic RL job? Is there any document elaborating this?

Thank you!

araffin / rl-baselines-zoo

How is the trial value calculated for RL jobs? #104