flux-framework / flux-core

core services for the Flux resource management framework
GNU Lesser General Public License v3.0
167 stars 50 forks source link

testsuite: add MPI tests #2192

Closed garlick closed 5 years ago

garlick commented 5 years ago

Add tests that validate that the exec system can launch MPI jobs.

garlick commented 5 years ago

Note that although MPI sharness tests were purged with wreck, we do still have

The following tests could be restored from the v0.11 series and ported to the new execution system:

dongahn commented 5 years ago

@garlick: Given various breakages we have had with OpenMPI, does it make sense to have CI coverage for that MPI?

dongahn commented 5 years ago

For example: https://github.com/flux-framework/flux-core/issues/2170

garlick commented 5 years ago

Good idea. Maybe we should try to have:

It would actually be really nice if we could find a way to get CI coverage for multiple versions of each MPI type...

dongahn commented 5 years ago

It would actually be really nice if we could find a way to get CI coverage for multiple versions of each MPI type...

 Yes great idea. Can CI matrix support be leveraged?

t3012-mpi-spectrum.t ditto spectrum mpi

Not sure if this is possible -- whether IBM's tarbell can be run in CI... we can certainly engage IBM about this.

grondo commented 5 years ago

This wouldn't be advisable under Travis, but buildbot is good at these combinatorial testing patterns. Maybe though, we could install mpich in half our existing Travis builders, and openmpi in the others.

It would be ok if spectrum wasn't able to be used in Travis, at least on systems with spectrum installed make check would catch any errors.

Another idea (not necessarily better) would be to make a full MPI test suite as a separate project under flux-framework, with a different test strategy.

dongahn commented 5 years ago

Another idea (not necessarily better) would be to make a full MPI test suite as a separate project under flux-framework, with a different test strategy.

I sort of like this idea as such a project can be run not only on Travis but also other platforms including LC systems directly...

SteVwonder commented 5 years ago

Another idea (not necessarily better) would be to make a full MPI test suite as a separate project under flux-framework, with a different test strategy.

If we mirror it to GitLab (or vice-versa) then we can leverage the ECP CI project and run the regression tests on actual LC systems via the GitLab runner. The one downside being that if it lives on GitHub and is mirrored to GitLab, the tests will not run on PRs (so no CI), but we would still benefit from periodic regression tests (nightly/weekly).

garlick commented 5 years ago

Revisiting this since it's on our project board that's due July 31.

The PMI tests (kvs and info) was incorporated into t2601-job-shell-standalone.t and t2602-job-shell.t, so that's covered.

It's a relatively simple matter to bring back t3000-mpi-basic.t once #2246 (output handling) is merged. In fact I'll tack that on to #2246.

We don't have MPI "Personality" support yet - that will come in the next project board (complete execution system). So I propose that we let bringing back t3000 close this issue. I'll open another issue to try to capture the MPI testing we'll want for the next project board, and reference this one.