Closed TingLei-NOAA closed 6 months ago
Probably the better approach is to run all the tests on a single allocation rather than have each test go on the queue.
With salloc
(or sbatch
) you could then make sure you have enough nodes and limit the number of tasks per node.
salloc -q <queue> -N 2 --ntasks-per-node=18 ctest
In case ctest
now calls e.g. srun -n 36 hostname
I can see that the output is 18 tasks each on their own node.
@wdeconinck Yes. Your suggested method does work! That could be used as a work-around, at least, for being now. The problem is that this ctest could be run among other ctests which don't need those extra nodes. So, a test-specific control is desired. If to add processing for such extra options in the ecbuild is not feasible or not regarded as necessary, I can replace the current executable with a wrapping script in which various manipulation could be done.
ecbuild_add_test
goes through this logic:
https://github.com/ecmwf/ecbuild/blob/develop/cmake/ecbuild_add_test.cmake#L419C1-L432C85
MPI_ARGS
could be user-defined, and was intended for e.g. openmpi's "--oversubscribe" flag.
It would be up to you to set the to -N 2 --ntasks-per-node
just for that test, and only in the case of slurm
and make sure it is set to original value after ecbuild_add_test
.
Note I did not test following, but conceptually:
if (MPI_ARGS)
set(RESTORE_MPI_ARGS ${MPI_ARGS})
endif()
if ( <slurm detected> )
set( MPI_ARGS "${MPI_ARGS} -N 2 --ntasks-per-node=18")
endif()
ecbuild_add_test( ... MPI 36 ... )
if( RESTORE_MPI_ARGS )
set(MPI_ARGS ${RESTORE_MPI_ARGS})
endif()
@wdeconinck , Did you mean I need to do changes (you suggested) to ecbuild_add_test to use MPI_ARGS ( and, yes, just to use MPA_ARGS as one argument didn't work in my tries )?
@TingLei-NOAA no I meant it like in the code snippet above, to set MPI_ARGS as a normal CMake variable. It gets picked up by ecbuild_add_test
.
Its original intention is as a global variable via cmake configuration "-DMPI_ARGS=..." but then you're in the same undesired situation where it gets applied to each MPI test. That's why I suggested to do the "RESTORE_MPI_ARGS ... " dance.
@wdeconinck Sure. Changing of MPI_ARG before and after the ecbuild_add_test seems a solution. I will try it and come back with an update here.
So many thanks!
The changes suggested by @wdeconinck does work. Thank you @wdeconinck for your help! I would close this issue!
Is your feature request related to a problem? Please describe.
A user ctest is to be run with a fixed mpi task numbers (say, -n 36). however in another machine, this job has to be run on two nodes to avoid memory related failure.
Describe the solution you'd like
The easiest way for me is to add srun comand parameters as " -N 2 --ntasks-per-node=18" in ecbuild_add_test. But digging in ecbuild_add_test.cmake and doing "try and error" , I failed to find a way to do so with ecbuild_add_test
Describe alternatives you've considered
So, I am wondering if ecbuild experts could clarify on this. Did I miss any existing ecbuild functions to do this? Thanks.
Additional context
No response
Organisation
LYNKER--EMC/NCEP/NOAA