Cloud checkpoints are cool! But once you use the WandbLogger, no cloud checkpoints (or anything really) is saved to trainer.default_root_dir. The model is checkpointed as a Wandb artifact, which is cool, but I want it also in trainer.default_root_dir's s3 bucket.
There reason I want this:
wandb checkpoints are good if you want to go back and find something from six months ago.
However, they are a pain to use if you are in back-to-back experimental cycle, rather than just remembering the S3 location and using it. Additionally it is incompatible with @skypilot-org storage, which is a much cleaner idiom / pattern.
Related bug Lightning-AI/pytorch-lightning#16196 . See 'More info' at the bottom of this issue.
Here is a google colab that replicates this and a related bag. I share the code for both because it's easier to configure the AWS credentials and see both bugs simultaneously.
Copying and pasting the most important bit (but see the colab for a full minimal replication):
### Error messages and logs
There is no error message, but `{BORING_BUCKET}/wandbtest/` (an S3 location) is empty, and the checkpoint is only in Wandb.
### Environment
OS: Linux
processor: x86_64
python: 3.8.16
version: Lightning-AI/pytorch-lightning#1 SMP Fri Aug 26 08:44:51 UTC 2022
More info
What I really want for christmas this year, all packaged together:
I have a CSVLogger that persists to s3.
I have a WandbLogger that saves checkpoints to Wandb.
I have an S3 trainer.default_root_dir that also saves checkpoints to s3.
cc @awaelchli @morganmcg1 @borisdayma @scottire @parambharat @manangoel99
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions - the Lightning Team!
