Closed qihuazhong closed 1 year ago
Hello,
I need to double check but if I recall, n_warmup_steps
refers to the number of evaluations reported to Optuna so far, not the n_timesteps
which is internal to SB3 (and not known by Optuna).
Hello, I need to double check but if I recall,
n_warmup_steps
refers to the number of evaluations reported to Optuna so far, not then_timesteps
which is internal to SB3 (and not known by Optuna).
As I read into the codes, now I understand what you are talking about... So the original calculation n_warmup_steps=self.n_evaluations // 3
is fine. The culprit of my optimiation not working was not here.
On a relevant topic though, should we force the first evaluation to only start after the learning_start
steps for algos like DQN and TD3?
should we force the first evaluation to only start after the learning_start steps for algos like DQN and TD3?
this can't be done as the learning starts is usually optimized.
Anyway, the pruner will quickly prune the trials where the learning_starts
is too high compared to the total budget (and those trials are cheap to run).
closing as the original question was answered.
š Bug
The current calculation of the
n_warmup_steps
is to dividen_evaluations
by 3, which does not really make sense.According to the usage/definition,
n_evaluations
is a very small number (Default is 1 evaluation per 100k timesteps), compared to relevant variables, for examplelearning_start=50000
of the DQN algo.Algos with la arge number of learning_start, such as DQN, is likely affected the most. I believe this results in DQN models being pruned before learning even starts in the default setting.
The intended calculation should probably be:
I could submit a PR if the change is agreed.
To Reproduce
Relevant log output / Error message
No response
System Info
No response
Checklist