Open ben-da6 opened 4 months ago
This condition here is meant to prevent the iter()
from getting called a second time, because in this case restarting
should be True.
But it isn't. The problem is that the fit loop sets restarting=False
even though we are resuming, due to the logic here:
This is tricky to solve @carmocca. The logic probably needs to be lifted up into the fit loop before epoch_loop.run()
, with a different conditioning that does not rely on restarting
.
I didn't look too deeply. Couldn't we check restarting
too for the FitLoop
's iter
call? We have a lot of tests around this so If a solution passes them we should be good.
The problem in the restarting
property is self._iteration_based_training()
is False
Also since this has appeared twice now, and its the sort of bug which is hard to track down could we add a test like my example?
Bug description
This bug has reappeared https://github.com/Lightning-AI/pytorch-lightning/issues/18414
We now call iter() twice in different places:
What version are you seeing the problem on?
v2.1
How to reproduce the bug
Error messages and logs
relevant logs are:
Environment
lighting==2.1.4
More info
No response
cc @justusschock @awaelchli @carmocca