This pull request adds a cluster execution model to RunModel. The user provides a cluster- and scheduler-specific script that uses the appropriate commands to launch the computationally intensive portions of the simulation in parallel on the cluster. The example provided in the documentation for this pull request shows how multi-core jobs can be tiled over available resources using a bash for-loop.
Related Issue
Please link to the issue here: #200
Motivation and Context
By adding this capability, it gives users more freedom in how resources are leveraged on different HPC systems. It substantially broadens the scale of analyses that can be performed within the RunModel module.
How Has This Been Tested?
This new execution model was tested both locally on Ubuntu 20.04 and on the Savio Condo Cluster at UC Berkeley. Savio uses a slurm scheduler, so the Python script that initiates the RunModel workflow is called in the slurm batch script.
Types of changes
What types of changes does your code introduce? Put an x in all the boxes that apply:
[ ] Bug fix (non-breaking change which fixes an issue)
[x] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to change)
Checklist:
Go over all the following points, and put an x in all the boxes that apply.
If you're unsure about any of these, don't hesitate to ask. We're here to help!
[x] My code follows the code style of this project.
[x] My change requires a change to the documentation.
Tiled parallel jobs in RunModel
Description
This pull request adds a cluster execution model to
RunModel
. The user provides a cluster- and scheduler-specific script that uses the appropriate commands to launch the computationally intensive portions of the simulation in parallel on the cluster. The example provided in the documentation for this pull request shows how multi-core jobs can be tiled over available resources using a bash for-loop.Related Issue
Please link to the issue here: #200
Motivation and Context
By adding this capability, it gives users more freedom in how resources are leveraged on different HPC systems. It substantially broadens the scale of analyses that can be performed within the
RunModel
module.How Has This Been Tested?
This new execution model was tested both locally on Ubuntu 20.04 and on the Savio Condo Cluster at UC Berkeley. Savio uses a slurm scheduler, so the Python script that initiates the
RunModel
workflow is called in the slurm batch script.Types of changes
What types of changes does your code introduce? Put an
x
in all the boxes that apply:Checklist:
Go over all the following points, and put an
x
in all the boxes that apply. If you're unsure about any of these, don't hesitate to ask. We're here to help!