easybuilders / easybuild-framework

EasyBuild is a software installation framework in Python that allows you to install software in a structured and robust way.
https://easybuild.io
GNU General Public License v2.0
150 stars 202 forks source link

Define a reasonable API for MPI tests submissions #747

Open fgeorgatos opened 10 years ago

fgeorgatos commented 10 years ago

Hi,

we need to implement continuous-integration-style MPI-submission tests, eventually; such as http://centers.hpc.mil/MPI_TESTS/index.html or @besserox's CDASH-based http://my.cdash.org/index.php?project=HPC-OpenMPI (bother not with the errors, just capture the idea)

It dictates that we would like to have a uniform API for submitting MPI jobs. Yes, this is a difficult target; what are the possible options here?

A first iteration of the shopping list:

Any other offers? (don't rush to vote, let's just make a candidates collection first)

Disclaimer: Yes, everybody has some form of customization in this respect; is there anything that can do 80% of the job is a sane manner?

boegel commented 10 years ago

@fgeorgatos: Can you clarify how this ties in with EasyBuild exactly?

fgeorgatos commented 10 years ago

AFAI remember, the need popped up while discussing the need for a test step of the performance tools. Related to it: if the objective is feasible, the same applies for MPI stacks;

I can see though that you may deem this a high-hanging fruit and may wish to consider this external issue.

berndmohr commented 10 years ago

Well, there are two parts to it: 1) The platform (e.g. a large SMP node) allows interactive execution of MPI commands. Then you "just" need a portable wrapper around the different ways to start a MPI job (mpirun, mpiexec, runjob, etc etc) 2) The cluster has to be used with a batch system. Here you need to have a component which knows how to generate job scripts for all the batch schedulers out there (PBS, LSF, LoadLeveler, GridEngine etc etc) case A) the batch system allows interactive jobs. Create a interactive batch job and use the portable MPI wrapper from 1) to execute your tests case B) Only real batch jobs are allowed. Create the necessary batch job and wait for it. Ideally, for large tests (or doing more than package test at once), you want to create many jobs and then manage/wait for them all.