Open samgelman opened 2 years ago
wondering what'd be the issues of only using relative paths i.e not changing paths at all.
I think we could change it. It started to use realpath
in https://github.com/Lightning-AI/lightning/pull/2153 with the intent to fix a logic bug when filepath=a_local_file
was passed. But filepath
does not exist anymore.
It would still be a breaking change as it's possible somebody relies on it being a realpath.
🚀 Feature
It would be great if
ModelCheckpoint
internal state supported relative paths.Motivation
Currently, if you specify a relative path for
dirpath
,ModelCheckpoint
converts it to an absolute path under the hood. This makes it hard to resume training if the log directory is moved, or if resuming training from a different server with a different directory structure.For example, I specify the relative path
a/b/c
, andModelCheckpoint
converts it tocwd()/a/b/c
. The model trains correctly for a while. Then, HTCondor reschedules my job on a different server. Now, thecwd()
is different, even though the relative patha/b/c
is still the same. The job is unable to resume from checkpoint and I get:Pitch
Create an argument
relative_paths=True
that would allowModelCheckpoint
to use relative paths in its internal state.Alternatives
User can create their own checkpoint callback that supports relative paths. But, it would be much nicer if Lightning supported it :-)
Additional context
If you enjoy Lightning, check out our other projects! âš¡
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging PyTorch Lightning, Transformers, and Hydra.
cc @borda @carmocca @awaelchli @ninginthecloud @jjenniferdai @rohitgr7 @akihironitta