E3SM-Project / polaris

Testing and analysis for OMEGA, MPAS-Ocean, MALI and MPAS-Seaice
BSD 3-Clause "New" or "Revised" License
6 stars 13 forks source link

Change algorithm for constraining resources #79

Closed altheaden closed 1 year ago

altheaden commented 1 year ago

This is a port from changes made in Compass: https://github.com/MPAS-Dev/compass/pull/573

Previously, nothing was preventing cpus_per_task from being more than the number of cores on a node. This is inconvenient because unsafe or non-performant behavior will occur if more python threads are used than cores on a node.

This merge changes the way resources are constrained. First, cpus_per_task is constrained to be less than the lower of the number of cpus on a node or the number of total cores available. Then, we check to see if cpus_per_task is smaller than the minimum allowed for the given step. Next, we allow all tasks to have the same cpus_per_task and constrain the number of tasks not to exceed the total available resources. Finally, we check to make sure the number of tasks is not below the allowed minimum for the step.

There are 4 typical resource requirements for steps:

serial (1 task and 1 cpu per task) threading or multiprocessing on a node (1 task, multiple cpus per task) MPI jobs without threading (multiple tasks, 1 cpu per task) MPI jobs with threading (multiple tasks, a few cpus per task) This algorithm should handle all of these without difficulty.

Checklist

altheaden commented 1 year ago

I have tested this just now with the most recent version of main. I ran the pr suite against a baseline and everything passed as expected.

altheaden commented 1 year ago

@xylar the dev guide documentation changes didn't make it through the rebase (the filename was different and git got confused), so please double check that those changes are how you want them to be.

altheaden commented 1 year ago

OK, I just built the documentation locally and I can see some formatting errors, I'll fix those now.

altheaden commented 1 year ago

Formatting is fixed now, just some issues converting from rst to markdown that I didn't notice before.