Closed carlobretti closed 1 year ago
Would you be interested in working on a PR to fix it?
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!
For anyone encountering this, a quick hack to get it to work would to replace https://github.com/Lightning-AI/lightning/blob/28e18881a9ad2298169c78ad9ae109191e201c2f/examples/pl_loops/kfold.py#L208 with
old_state = self.trainer.state.fn # HACK
self.trainer.state.fn = TrainerFn.FITTING
self.trainer.strategy.setup_optimizers(self.trainer) # https://github.com/Lightning-AI/lightning/issues/12409
self.trainer.state.fn = old_state
Might be a good idea to reset the model in the on_advance_start
, instead of in on_advance_end
(so we don't rely on the previous iteration finishing correctly). Calling _reset_fitting
doesn't work in on_advance_start
doesn't work because dataloaders are reset (and I don't fully understand the intricacies of the internal callbacks)
Using pytorch-lightning==1.7.5
Another bug I encountered was the callbacks not being reinitialized, so I added a way of restoring the state of the callbacks
class KFoldLoop(Loop):
# ...
def on_run_start(self, *args: Any, **kwargs: Any) -> None:
# ...
self.callback_state_dicts = [ deepcopy(callback.state_dict()) for callback in self.trainer.callbacks ]
def on_advance_end(self) -> None:
# ...
# restore the callbacks
for callback, state_dict in zip(self.trainer.callbacks, self.callback_state_dicts):
callback.load_state_dict(deepcopy(state_dict)) # deepcopy because ``ModelCheckpoint`` assigns without copying
We now have a new k-fold example where for each split we create a new optimizer. https://github.com/Lightning-AI/lightning/tree/master/examples/fabric/kfold_cv
I'm closing the issue due to its age, but of course any further feedback is appreciated!
🐛 Bug
When
self.trainer.strategy.setup_optimizers(self.trainer)
is called at the end of each fold inon_advance_end
, the value ofself.trainer.state.fn
is stillTrainerFn.TESTING
.https://github.com/PyTorchLightning/pytorch-lightning/blob/5d156f4ff62e02a94b0ffe497472e7035149e9a4/pl_examples/loop_examples/kfold.py#L201-L207
Since
self.trainer.state.fn = TrainerFn.TESTING
,_init_optimizers_and_lr_schedulers
is never called, as determined bysetup_optimizers()
, so the optimizer and scheduler are never reset across different folds.https://github.com/PyTorchLightning/pytorch-lightning/blob/5d156f4ff62e02a94b0ffe497472e7035149e9a4/pytorch_lightning/strategies/strategy.py#L121-L131
To Reproduce
Run the kfold.py example, and check how learning rate changes using the
LearningRateMonitor
callback.Expected behavior
Expected optimizer and scheduler to be reset at the end every fold.
Environment
conda
,pip
, source):torch.__config__.show()
:Additional context