ATS - Automated Testing System - is an open-source, Python-based tool for automating the running of tests of an application across a broad range of high performance computers.
Remove jsrun_nn / nn option on blueos. This was a command
line option to over-ride or set the number of nodes for each
test. This option is not needed, and led to confusion.
If nn (number of nodes) is not specified on an individual
test basis when run with jsrun, it will be left unset
if np (number of mpi ranks) is < 40. This will allow
jsrun to run concurrent jobs across the nodes in the
allocation. If a test uses >40 mpi ranks, then
nn will be set as necessary to get enough nodes for the
job. In this case, as the job will span multiple nodes,
the test will be given exclusive access to the resource
nodes via a resource set file.
Side Note: If a user wants to schedule jobs > 40 mpi ranks
which span multiple nodes and do not use lsf resouce sets
they should run ATS with the 'lrun' option. Lrun has the
smarts to do this. There is no sense in my making jsrun
under ATS do what LRUN already knows how to do.
However, there are definitely scenarios where using jsrun
and resource sets allows for more varied testing than can
be easily done with lrun. This includes hybrid test runs
where a user wants to say allocate 10 CPUs and 4 GPUs for
a resource. When using lrun the -g option is used to specify
the number of gpus per MPI rank, so does not easily allow
for that scenario in a 'packed' (concurrent job) run.
However jsrun does allow for this, as the -g option given
to jsrun directly is the number of GPUs for the resource
set, not for the MPI rank.
Note from the lrun help, -g is documented as per mpi
rank resource
-g Required GPUs per MPI task (--pack uses for placement)
And from the jsrun help, -g is documented as per resource set
-g, --gpu_per_rs=<# | ALL_GPUS>
While this difference in -g between lrun/jsrun may be
confusing to the end user, end user, it does allow for
more varied runs. And either using jsrun or lrun may be
well utilized depending on testing needs.
Remove jsrun_nn / nn option on blueos. This was a command line option to over-ride or set the number of nodes for each test. This option is not needed, and led to confusion.
If nn (number of nodes) is not specified on an individual test basis when run with jsrun, it will be left unset if np (number of mpi ranks) is < 40. This will allow jsrun to run concurrent jobs across the nodes in the allocation. If a test uses >40 mpi ranks, then nn will be set as necessary to get enough nodes for the job. In this case, as the job will span multiple nodes, the test will be given exclusive access to the resource nodes via a resource set file.
Side Note: If a user wants to schedule jobs > 40 mpi ranks which span multiple nodes and do not use lsf resouce sets they should run ATS with the 'lrun' option. Lrun has the smarts to do this. There is no sense in my making jsrun under ATS do what LRUN already knows how to do.
However, there are definitely scenarios where using jsrun and resource sets allows for more varied testing than can be easily done with lrun. This includes hybrid test runs where a user wants to say allocate 10 CPUs and 4 GPUs for a resource. When using lrun the -g option is used to specify the number of gpus per MPI rank, so does not easily allow for that scenario in a 'packed' (concurrent job) run. However jsrun does allow for this, as the -g option given to jsrun directly is the number of GPUs for the resource set, not for the MPI rank.
Note from the lrun help, -g is documented as per mpi rank resource
-g Required GPUs per MPI task (--pack uses for placement)
And from the jsrun help, -g is documented as per resource set
-g, --gpu_per_rs=<# | ALL_GPUS>
While this difference in -g between lrun/jsrun may be confusing to the end user, end user, it does allow for more varied runs. And either using jsrun or lrun may be well utilized depending on testing needs.