Denys88 / rl_games

RL implementations
MIT License
848 stars 142 forks source link

Continuing training from checkpoint #173

Closed cvoelcker closed 2 years ago

cvoelcker commented 2 years ago

Hi, on the computing infrastructure I am using I need to continue interrupted training regularly. I have been trying to use the checkpointing utility (for PPO, but I think these issues appear for all) to reload the checkpoints, but the training doe not actually continue from those checkpoints. I believe that is because other important parameters such as the optimizer are not stored in the checkpoints (please correct me if I am wrong).

In the image below, I interrupted two runs with the same seed at two different states and continued training from the latest checkpoint. image

Would it be possible to checkpoint all components of the algorithms to enable continuing training from a checkpoint?

Denys88 commented 2 years ago

I save optimizer state, but probably adaptive LR can break something and need to doublecheck if I save latest LR. Also it could be related to how I gather statistics: Once you restarted training first envs with done true have low scores because they finished earlier. I can save statistics state and it may improve results BUT when you restart training, all envs start playing from scratch and it might impact rewards anyway.

cvoelcker commented 2 years ago

Ah, thanks for the clarification. I am going to add an independent "test" run every couple of iterations to see if this is indeed nonly a reporting artifact. However, the performance collapses several times, even though I only interrupted once, so I think something else might also be going on?

Denys88 commented 2 years ago

You can try it with 0 learning rate. It should return back to the same numbers pretty fast.