ashleve / lightning-hydra-template

PyTorch Lightning + Hydra. A very user-friendly template for ML experimentation. ⚡🔥⚡
4.13k stars 642 forks source link

how do we resume from a previous experiment? #411

Open marsggbo opened 2 years ago

marsggbo commented 2 years ago

Suppose the previous exp results are saved in ./logs/runs/exp1, if we want to resume from exp1, we can set trainer.resume_from_checkpoint=./logs/runs/exp1/checkpoint/best.ckpt to load the checkpoint.

However, my question is:

  1. can we reuse the path of exp1, i.e., the new checkpoint will still be saved in ./logs/runs/exp1/ instead of a new path, e.g., ./logs/runs/exp2
  2. I've tried to set --config-file=./logs/runs/exp1/.hydra. However, it seems that hydra will only load config.yaml and ignore hydra.yaml. How can we load hydra.yaml as I modify the save pattern (hydra.run.dir) of hydra?
ashleve commented 2 years ago
  1. yes, you can change path in model_checkpoint callback to absolute path here: https://github.com/ashleve/lightning-hydra-template/blob/8987b23b7f991a3de3f043058abd7ba4f63ea13f/configs/callbacks/default.yaml#L8-L9

  2. No straightforward solution here, take a look at this issue for overview: https://github.com/facebookresearch/hydra/issues/1805

There is experimental hydra rerun callback for pickling configs which can be later reloaded https://deploy-preview-2098--hydra-preview.netlify.app/docs/next/experimental/rerun/