mpi: support for MPI testing on LC hardware

wihobbs commented 10 months ago

This PR is a stab at supporting MPI testing on LC resources.

We want MPI testing to be extensible easily in three major ways:

MPI implementation and compiler being tested. These vary by machine.
- For this simple example, an MPI implementation and compiler can be added by adding a new command line call in .gitlab/mpi-test.gitlab-ci.yml.
Test code. This initial PR starts with just “hello, world.”
- Currently, a new test could be added via a new function in mpi/mpi_tests.c. A call in the main function gathering the return code would also be required.
Machines being tested. As El Cap and other machines become available, we want to add them (and replace Tioga/EAS systems.)
- A new machine is probably the most involved thing to add. A script would have to be added to .gitlab/mpi-test.gitlab-ci.yml that covered the MPI implementations and compilers for that machine, and three things would need to be added to the main .gitlab-ci.yml file: the machine specifications, a reference wrapper building flux and executing the MPI tests, and a test for gitlab to run. See .corona, .test-core-mpi-corona, and corona-mpi-test, respectively, examples of this.

wihobbs commented 10 months ago

The MPI hello test should probably be named hello or similar instead of more the more generic mpi_tests.c. The reason is that we may want to add other simple MPI based tessin the future (e.g. in flux-core we have hello and abort and version tests)

My original reason for doing this was that I thought we could add the abort and version tests as additional functions to mpi_tests.c, which would mean only having to compile and link one piece of code. I'm guessing you want these to be separately compiled and run?

Eventually, we might want to move the MPI testing driver (currently the script in mpi-test.gitlab-ci.yml) to a script so it is easier to update the set of MPI implementations and compilers that are tested on each cluster. (We may want to add a config file for example so the list is easily updated and all in one place.)

I like the idea of a config file that could compile and run tests and standardize this across multiple machines. The implementation of this is still a little nebulous in my mind. I'll see if I can hammer out an example...

I think moving the flux run ./src/cmd/flux call to the script would be a good start. We could probably trash the mpi-test.gitlab-ci.yml file if we did this (and just call the script instead).

grondo commented 10 months ago

My original reason for doing this was that I thought we could add the abort and version tests as additional functions to mpi_tests.c, which would mean only having to compile and link one piece of code. I'm guessing you want these to be separately compiled and run?

This is not a bad idea, but I think it will result in more complexity in the long term (plus if we have a test or benchmark from elsewhere, it will be more work to integrate it into the test program than it would be to just drop in the new test).

I think moving the flux run ./src/cmd/flux call to the script would be a good start. We could probably trash the mpi-test.gitlab-ci.yml file if we did this (and just call the script instead).

That sounds good. I think eventually we'll be submitting a suite of tests to the CI flux instance. The script can eventually handle this submission, monitoring of tests, and collection of results from all jobs.

wihobbs commented 10 months ago

I think eventually we'll be submitting a suite of tests to the CI flux instance.

What you're describing sounds to me like we'll be creating one Flux instance in CI (probably 2 full nodes) and then submitting many different MPI jobs utilizing different compilers to it, rather than creating many small instances (say, 2 nodes, 1 core on each) for each individual MPI job. Am I tracking correctly?

grondo commented 10 months ago

What you're describing sounds to me like we'll be creating one Flux instance in CI (probably 2 full nodes) and then submitting many different MPI jobs utilizing different compilers to it, rather than creating many small instances (say, 2 nodes, 1 core on each) for each individual MPI job. Am I tracking correctly?

I think there's a small bit of design work that needs to be done here. I haven't thought about this in detail so I apologize if my thoughts are not well-formed, but it seems like each MPI+compiler test is comprised of the following steps (this is just my first thought, so happy to discuss further)

Load compiler + mpi environment
Build test code in some kind of scratch directory
submit a defined suite of MPI tests as jobs
wait for all jobs
collect results (where 4. and 5. could be done continuously perhaps)

These steps seem to naturally compose what we'd think of as a batch job. The batch script would handle these steps including compilation of the MPI tests with the defined compiler and MPI, then would submit the suite of jobs and collect and report resuls (implementation TBD). An outer script would submit a batch job for each test mpi and compiler that we're targeting to the CI instance. That way the more resources the CI Flux instance has, the faster we'll run through these tests.

Does that make any sense?

grondo commented 10 months ago

The batch script would handle these steps including compilation of the MPI tests with the defined compiler and MPI, then would submit the suite of jobs and collect and report resuls (implementation TBD)

I'll note one drawback to doing the compilation in the batch job is that cores in the allocation will go idle during this stage since no jobs can be run until the compilation completes. An optimization might be to submit the compile step as one single-node job, and the tests as a batch job with a dependency on the compile job. However, this feels like a premature optimization at this point.

Hm, we could also submit all of the compile and MPI tests as jobs to the CI instance with appropriate dependencies (no nested batch jobs). This would allow more flexibility in the size of MPI test jobs and would perhaps be more efficient scheduling. It also may be easier to collect the results since all the jobs are submitted at one level :thinking:

wihobbs commented 10 months ago

An outer script would submit a batch job for each test mpi and compiler that we're targeting to the CI instance. That way the more resources the CI Flux instance has, the faster we'll run through these tests.

The outer script you described is a major piece this PR is missing. The bare bones of 1-5 you described comprising a batch job are prototyped in de1ce16. However, 3-5 need improvement.

Hm, we could also submit all of the compile and MPI tests as jobs to the CI instance with appropriate dependencies (no nested batch jobs).

I think we're on the same page. If we're requesting a 2 node instance for testing interconnects, we could submit all of the compilation and run batch jobs to the enclosing instance (each requesting 2 nodes and n cores where n >=2) and let Flux sort out what runs when.

then would submit the suite of jobs and collect and report resuls (implementation TBD).

This is on my todo list not only for the MPI work but for aggregating results from the testsuite runs as well. One thing I have noticed when running MPI jobs is there are some things in stderr we may want to collect that don't cause a nonzero return code but do say things we should look at.

All excellent thoughts, thanks so much @grondo. I think we're making a lot of progress here, or at least I'm starting to grasp what this could look like. As a first step, I'll look into the "outer script" you described, and we can reason more from there.

wihobbs commented 10 months ago

@grondo Let me know if this is closer to the target. Note that, for debugging purposes, it currently outputs the stdout of all completed jobs. I imagine that in the future we could have a debug=True or --d flag that did this, and the normal behavior would be to only output failed jobs as we discussed.

Here's how I've been running for testing:

flux alloc -N2
cd ~/flux-test-collective
MPI_TESTS_DIRECTORY=$(pwd)/mpi FTC_DIRECTORY=$(pwd) flux run -N2 ../flux-core/src/cmd/flux start ./mpi/outer_script.sh

wihobbs commented 10 months ago

I'll also add the logfile here on corona in case you wanted to see it.

wihobbs commented 9 months ago

@grondo This is ready for another review. Some notable changes:

removed my hello.c and replaced with 3 tests from flux-core
removed the Makefile, as looping to compile a pre-defined list of tests is easier in a shell script
testing with mvapich2 only for now (openmpi will depend on getting flux-pmix added to this testsuite)

wihobbs commented 9 months ago

Oh, and another GitLab logfile that might be helpful.

wihobbs commented 9 months ago

Thanks @grondo for the feedback! I believe I've addressed all your comments.

wihobbs commented 9 months ago

Merging. Thank you @grondo, I know this one took a lot of work and iterations to review.

flux-framework / flux-test-collective

mpi: support for MPI testing on LC hardware #7