Dears,
Thank you for framework, Please see the output of hyperparameter training on SB3 algorithm, why the reward in all episodes doesnt change, whats the problem?(I copied only three outputs)The reward output is the same in every study for any hyperparameter configuration
Dears, Thank you for framework, Please see the output of hyperparameter training on SB3 algorithm, why the reward in all episodes doesn
t change, what
s the problem?(I copied only three outputs)The reward output is the same in every study for any hyperparameter configuration| time/ | | | episodes | 172 | | fps | 92 | | time_elapsed | 1014 | | total_timesteps | 93568 | | train/ | | | actor_loss | 5.13 | | critic_loss | 7.43 | | ent_coef | 0.00201 | | learning_rate | 0.0148 | | n_updates | 93447 | | reward | -3.8389 |
| time/ | | | episodes | 176 | | fps | 92 | | time_elapsed | 1039 | | total_timesteps | 95744 | | train/ | | | actor_loss | 4.84 | | critic_loss | 39.8 | | ent_coef | 0.00201 | | learning_rate | 0.0148 | | n_updates | 95623 | | reward | -3.8389 |
| time/ | | | episodes | 180 | | fps | 92 | | time_elapsed | 1062 | | total_timesteps | 97920 | | train/ | | | actor_loss | 4.44 | | critic_loss | 3.66 | | ent_coef | 0.00201 | | learning_rate | 0.0148 | | n_updates | 97799 | | reward | -3.8389 |