✨ Allow user to specify the gpu of choice. This will allow two embarissingly parallel models to be trained at the same time on LS6 (one per a100 GPU).
✨ Allow user to specify latest instead of the specific model name. This will allow multiple jobs to be queued via slurm for long rums and avoids needing to manually update the restart file name and relaunch the training every 48 hours.
🔨 Make device non-global, needed to allow the manual selection of the device via command line argument.
In this PR
latest
instead of the specific model name. This will allow multiple jobs to be queued via slurm for long rums and avoids needing to manually update the restart file name and relaunch the training every 48 hours.device
non-global, needed to allow the manual selection of the device via command line argument.