Closed DaniJonesOcean closed 2 months ago
If needed, run through the environment setup before doing this:
https://github.com/CIGLR-ai-lab/GreatLakes-TempSensors/blob/main/ENVIRONMENT_SETUP.md
@DaniJonesOcean Prior to this issue I had attended an HPC workshop, where I somewhat understood the basics of using U-M HPC Great Lakes and downloaded resources such as the user guide and "cheat sheet". When using the HPC for this project, I used Jupyter notebooks via on-demand jobs where I was able to understand accessing a virtual environment, requesting memory, using different partitions, and using module commands. Through both of these experiences, along with the sample job script provided, I have been able to successfully submit slurm jobs. Using the nano and sbatch commands, I submitted a script that imported DeepSensor, where I found that the package was not downloaded into the environment I had been using. After appropriately downloading required packages I have started running an expensive script, where I have expectedly run into some memory issues to work on.
@eredding02 Looks good! Nice work. Feel free to close this issue when you're ready.
Issue Description: We are moving away from on-demand jobs, which have to be monitored constantly, to batch jobs submitted and managed on a queue.
The Great Lakes HPC system uses the Slurm Workload Manager for job management and scheduling. A solid understanding of Slurm and its batch scripts will help make job management easier (partly by reducing the need to "babysit" the jobs!)
Key Objectives:
Familiarize yourself with the job submission process, using the
sbatch
command to submit a basic job script. A starter script might look like this:Modify the script with necessary adjustments for your GPU environment, ensuring the virtual environment setup is incorporated.
Deliverables:
Feel free to reach out if you need clarification on any steps!