Open schmidt-ai opened 1 year ago
@schmidt-ai It isn't documented, so it's considered not supported and so it's not a bug. I'm happy to convert it to a feature request though.
Our goal here was to build minimally and get the checkpoint loading and saving done well for FSDP. Limiting the scope to the local filesystem is not totally unreasonable in my opinion. It should be possible to extend the logic to make use of fsspec to cover the other cases.
@awaelchli totally makes sense! I was debating whether to file this as a feature request or bug, my apologies!
No need to apologize. The request is very good. We know we have some gaps in Lightning in general regarding remote filesystem support and it's good to document them. This reminds me that I need to get back to this one https://github.com/Lightning-AI/lightning/pull/18252 where I got hung up on working on test cases and then other priorities came up. So in fact, it's me who needs to apologize 😄
PyTorch is also considering supporting fsspec paths automatically: https://github.com/pytorch/pytorch/issues/118036
PyTorch is also considering supporting fsspec paths automatically: pytorch/pytorch#118036
@awaelchli Do you think we need to extend checkpoint loading & saving for remote filesystem within PyTorch Lightning (if so, I'm happy to open a PR)? Or we just waiting PyTorch to support ?
Bug description
FSDPStrategy.load_checkpoint
castscheckpoint_path
to apathlib.Path
here. This will bork URIs, such as cloud checkpoint paths, e.g.s3://...
.Example:
NOTE: I am reporting this merely by looking at the source code; I have yet to confirm this with a test.
What version are you seeing the problem on?
master
How to reproduce the bug
Error messages and logs
I believe this exception will be raised.
Environment
Current environment
``` #- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow): #- PyTorch Lightning Version (e.g., 1.5.0): #- Lightning App Version (e.g., 0.5.2): #- PyTorch Version (e.g., 2.0): #- Python version (e.g., 3.9): #- OS (e.g., Linux): #- CUDA/cuDNN version: #- GPU models and configuration: #- How you installed Lightning(`conda`, `pip`, source): #- Running environment of LightningApp (e.g. local, cloud): ```More info
No response
cc @borda @awaelchli @carmocca