If you construct the Lightning training module and then immediately load a checkpoint, it crashes because self.storage is None until the first training_step call. This creates the event storage in on_load_checkpoint too. I'm not sure if this is the best solution or if it should actually go in init. The comment in training_step indicates it should not go there.
In training_step several things are being set up if self.storage is None. I split out the self.writers setup, but possibly something should be done with iteration_timer as well. My current use case is not resuming training from a checkpoint, so I haven't tried that.
If you construct the Lightning training module and then immediately load a checkpoint, it crashes because
self.storage
is None until the first training_step call. This creates the event storage in on_load_checkpoint too. I'm not sure if this is the best solution or if it should actually go in init. The comment in training_step indicates it should not go there.In training_step several things are being set up if self.storage is None. I split out the self.writers setup, but possibly something should be done with iteration_timer as well. My current use case is not resuming training from a checkpoint, so I haven't tried that.