Closed AMR-aa1405465 closed 1 year ago
Hello,
you are probably missing the reset_num_timesteps=False
parameter (see doc).
Also related, why changing the exploration rate alone doesn't work (you need to change the schedule): https://github.com/DLR-RM/stable-baselines3/issues/735#issuecomment-1047638011
Also related: https://github.com/DLR-RM/stable-baselines3/issues/529
Looks like a duplicate of https://github.com/DLR-RM/stable-baselines3/issues/597#issuecomment-937207471 also related (for running multiple times learn()): https://github.com/DLR-RM/stable-baselines3/issues/957
Yup, reset_num_timesteps=False
did the trick =D
Thanks for the help mate
🐛 Bug
Hello everyone, I appreciate your work. I have a little bit embarrassing problem =|
I have recently encountered an issue while attempting to train, save, and reload my DQN model within a Gymnasium environment with only one environment. The problem lies in the knowledge transfer from the saved to the loaded model. During the initial training phase on (CartPole-v1), I successfully reached a reward of 200 after 100 K timesteps.
However, when I reload the model for further training, I expect to start with a reward of around 200 and retain the hyperparameters set during the initial training (e.g., exploration rate). Unfortunately, this is not happening as expected.
In my case, the learning seems to start from the beginning as I see small rewards only, in addition, the exploration rate gets back to 1, not 0.05 as before.
Unlike DQN, I have tried this feature before on PPO and was working fine.
To fix this issue, I tried the following without effect:
model.set_env()
) with dummyVecEnv and normal env.model.set_parameters()
instead ofmodel.load()
To Reproduce
Relevant log output / Error message
No response
System Info
Checklist