ashleve / lightning-hydra-template

PyTorch Lightning + Hydra. A very user-friendly template for ML experimentation. ⚡🔥⚡
4.2k stars 652 forks source link

Add ability to resume training from latest checkpoint without specifying path #112

Open ashleve opened 3 years ago

ashleve commented 3 years ago

Add some kind of method to recursively go over everything in logs/, and find the latest saved checkpoint (find by date saved). Add config flag for resuming training from the latest checkpoint:

resume_latest: True

Useful when we want to quickly resume our latest run without specifying ckpt path.

Should be added as an enhancement to utils.extras().

Could also automatically override the whole config with the correct one from .hydra folder.

turian commented 2 years ago

@ashleve This is cool but what if your artifacts are stored in wandb?

ashleve commented 2 years ago

@turian Checkpoints are always available in output dirs. Enabling uploading them as artifacts in wandb logger doesn't change that.

If you really need it despite of that, perhaps you could write a function that retrieves latest wandb checkpoint from current project through their API.

Supporting individual logger use cases is out of the scope of this template though, so I'm not planning on introducing anything like that.