A clear and concise description of what the bug is.
How to reproduce
Reproduce by starting a non-dry run via a notebook
Start the run
Look at files on the wandb interface. There are no checkpoints
Expected
Checkpoints should be uploaded to wandb whenever there is a better one available during training.
Additional context
I thought I fixed wandb, but it seems that I don't understand the symlinking model of wandb. Apparently you need to have checkpoints under the project root? But this would mean that you can't run multiple experiements at the same time.
What
A clear and concise description of what the bug is.
How to reproduce
Reproduce by starting a non-dry run via a notebook
Expected
Checkpoints should be uploaded to wandb whenever there is a better one available during training.
Additional context
I thought I fixed wandb, but it seems that I don't understand the symlinking model of wandb. Apparently you need to have checkpoints under the project root? But this would mean that you can't run multiple experiements at the same time.