Closed otherman16 closed 4 years ago
Looks interesting, maybe @Ditwoo could help with it. Meanwhile, @otherman16 have you tried to investigate the issue by yourself? maybe you already found the solution)) Could you please write down the version without such bug?
I have tried to investigate this bug. I've found:
...
for callback in callbacks.values():
if isinstance(callback, CheckpointCallback):
if callback.load_on_stage_start is None:
callback.load_on_stage_start = "best"
if (
isinstance(callback.load_on_stage_start, dict)
and "model" not in callback.load_on_stage_start
):
callback.load_on_stage_start["model"] = "best"
...
...
if self.load_on_stage_start is not None and checkpoint_exists:
self._load_runner(
runner,
mapping=self.load_on_stage_start,
load_full=need_load_full,
)
...
I can't understand why flag catalyst.core.callbacks.CheckpointCallback.load_on_stage_start is NOT set.
In catalyst v20.04 autoresume works.
looks like fixed :)
🐛 Bug Report
I am trying to resume training with
catalyst-dl run --autoresume last
from last epoch. For example from 60th epoch of 120. In previous version it works fine. But now catalyst loads only best checkpoint and starts training from the beginning.catalyst-dl run --resume=/path/to/checkpoint.pth
doesn't work too.Environment