FLAMEGPU / FLAMEGPU2

FLAME GPU 2 is a GPU accelerated agent based modelling framework for CUDA C++ and Python
https://flamegpu.com
MIT License
99 stars 19 forks source link

MPI Testing with multiple ranks but only 1 GPU #1146

Open ptheywood opened 8 months ago

ptheywood commented 8 months ago

As of #1090, MPI-backed distribtued ensembles will be implemented, with an MPI-only test suite, which will only striclty test the use of MPI in a multi-gpu scenario.

However, google test is not MPI aware, so there are a number of limitations on this test suite:

  1. Each MPI rank prints it's test output by default, making results hard to interpret.
  2. The final result of the test suite reported by MPI (i.e. the exit code) will be that of rank 0. I.e. if rank 0 passes but rank 1 fails, mythical CI would report it as a success
  3. Telemetry is issued from each rank.
  4. Deadlocks might cause issues, but hopefulyl we won't hit those...
  5. Death tests are not possible with MPI (we should prolly split our death tests out to cmake tests anyway given they are not multithreading safe, and cuda implicitly spawns a few threads).

There are a number of stale out of date google test + mpi repo's on github we could investigate to resolve these issues, or we can roll some custom mpi in main.cu which would deal with them (but not all of them), and not deal with them very well without a lot of effort.

See :


Some quick and dirty improvemetns could be (with big downsides):