LLNL / ATS

ATS - Automated Testing System - is an open-source, Python-based tool for automating the running of tests of an application across a broad range of high performance computers.
BSD 3-Clause "New" or "Revised" License
7 stars 5 forks source link

v7.0.1: Update LSF JSRUN options #12

Closed dawson6 closed 3 years ago

dawson6 commented 3 years ago

Remove jsrun_nn / nn option on blueos. This was a command line option to over-ride or set the number of nodes for each test. This option is not needed, and led to confusion.

If nn (number of nodes) is not specified on an individual test basis when run with jsrun, it will be left unset if np (number of mpi ranks) is < 40. This will allow jsrun to run concurrent jobs across the nodes in the allocation. If a test uses >40 mpi ranks, then nn will be set as necessary to get enough nodes for the job. In this case, as the job will span multiple nodes, the test will be given exclusive access to the resource nodes via a resource set file.

Side Note: If a user wants to schedule jobs > 40 mpi ranks which span multiple nodes and do not use lsf resouce sets they should run ATS with the 'lrun' option. Lrun has the smarts to do this. There is no sense in my making jsrun under ATS do what LRUN already knows how to do.

However, there are definitely scenarios where using jsrun and resource sets allows for more varied testing than can be easily done with lrun. This includes hybrid test runs where a user wants to say allocate 10 CPUs and 4 GPUs for a resource. When using lrun the -g option is used to specify the number of gpus per MPI rank, so does not easily allow for that scenario in a 'packed' (concurrent job) run. However jsrun does allow for this, as the -g option given to jsrun directly is the number of GPUs for the resource set, not for the MPI rank.

Note from the lrun help, -g is documented as per mpi rank resource

-g Required GPUs per MPI task (--pack uses for placement)

And from the jsrun help, -g is documented as per resource set

-g, --gpu_per_rs=<# | ALL_GPUS>

While this difference in -g between lrun/jsrun may be confusing to the end user, end user, it does allow for more varied runs. And either using jsrun or lrun may be well utilized depending on testing needs.