araffin / rl-baselines-zoo

A collection of 100+ pre-trained RL agents using Stable Baselines, training and hyperparameter optimization included.
https://stable-baselines.readthedocs.io/
MIT License
1.12k stars 206 forks source link

HER success_rate output [question] #102

Closed nndei closed 3 years ago

nndei commented 3 years ago

Hello,

I was comparing baselines and rl-baselines-zoo and when using HER I see the latter outputs just one success rate. Baselines, on the other hand, outputs two success rates, train and test. The difference between the two, if I got it right, is that the train one refers to the training success rate, which is likely going to be less than the test due to the use of noise for exploration.

May I ask, therefore, what is the success_rate in this output? Also, is there a resource where I can study to confirm my understanding of the other outputted parameters?

The following is an example of output. Also, I should ask, why doesn't number of epochs grow? This should be the 26th.


| obs_rms_mean | -0.0829 | | obs_rms_std | 0.384 | | reference_Q_mean | -8.74 | | reference_Q_std | 7.05 | | reference_action_mean | -0.232 | | reference_action_std | 0.925 | | reference_actor_Q_mean | -8.5 | | reference_actor_Q_std | 7.12 | | rollout/Q_mean | -8.01 | | rollout/actions_mean | -0.04 | | rollout/actions_std | 0.704 | | rollout/episode_steps | 150 | | rollout/episodes | 1.73e+03 | | rollout/return | -105 | | rollout/return_history | -82.7 | | success rate | 0.86 | | total/duration | 5.09e+04 | | total/episodes | 1.73e+03 | | total/epochs | 1 | | total/steps | 259998 | | total/steps_per_second | 5.1 | | train/loss_actor | 4.76 | | train/loss_critic | 0.0953 | | train/param_noise_di... | 0 |

Used these hyperparameters: MyEnv-v0: n_timesteps: !!float 20000 policy: 'MlpPolicy' model_class: 'ddpg' n_sampled_goal: 4 goal_selection_strategy: 'future' buffer_size: 1000000 batch_size: 256 gamma: 0.95 random_exploration: 0.3 actor_lr: !!float 1e-3 critic_lr: !!float 1e-3 noise_type: 'normal' noise_std: 0.2 normalize_observations: true normalize_returns: false policy_kwargs: "dict(layers=[256, 256, 256])"

Best regards