CIGLR-ai-lab / GreatLakes-TempSensors

Collaborative repository for optimizing the placement of temperature sensors in the Great Lakes using the DeepSensor machine learning framework. Aiming to enhance the quantitative understanding of surface temperature variability for better environmental monitoring and decision-making.
MIT License
0 stars 0 forks source link

Getting started with batch job submissions on U-M HPC Great Lakes #23

Closed DaniJonesOcean closed 2 months ago

DaniJonesOcean commented 2 months ago

Issue Description: We are moving away from on-demand jobs, which have to be monitored constantly, to batch jobs submitted and managed on a queue.

The Great Lakes HPC system uses the Slurm Workload Manager for job management and scheduling. A solid understanding of Slurm and its batch scripts will help make job management easier (partly by reducing the need to "babysit" the jobs!)

Key Objectives:

Deliverables:

Feel free to reach out if you need clarification on any steps!

DaniJonesOcean commented 2 months ago

If needed, run through the environment setup before doing this:

https://github.com/CIGLR-ai-lab/GreatLakes-TempSensors/blob/main/ENVIRONMENT_SETUP.md

eredding02 commented 2 months ago

@DaniJonesOcean Prior to this issue I had attended an HPC workshop, where I somewhat understood the basics of using U-M HPC Great Lakes and downloaded resources such as the user guide and "cheat sheet". When using the HPC for this project, I used Jupyter notebooks via on-demand jobs where I was able to understand accessing a virtual environment, requesting memory, using different partitions, and using module commands. Through both of these experiences, along with the sample job script provided, I have been able to successfully submit slurm jobs. Using the nano and sbatch commands, I submitted a script that imported DeepSensor, where I found that the package was not downloaded into the environment I had been using. After appropriately downloading required packages I have started running an expensive script, where I have expectedly run into some memory issues to work on.

DaniJonesOcean commented 2 months ago

@eredding02 Looks good! Nice work. Feel free to close this issue when you're ready.