Hey, I think the problem is that these keys in the config.yaml are not allowed:
seed_everything_default: null
log_dir: /cluster/dir/to/log
They don't match anything in the Trainer.
Perhaps it should be
seed_everything: false
default_root_dir: "/cluster/dir/to/log"
I tried to help here, did you find what the problem was? Please let me know.
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions - the Lightning Team!
Yes sorry I forgot to answer. I somehow messed up a lot of the key settings, so you were right. Thank you for your help
Thanks for confirming that it worked. Happy this was helpful.
Bug description
Hi, Im trying to run a simple pytorch lightning model training on mnist data using the pytorch CLI (with yaml config) as a slurm job.
How to reproduce the bug
Im starting the slurm job using:
sbatch file:
Error messages and logs
slurm-9842342.out (File where std:output is printed)
Current environment
