Open greschd opened 4 years ago
Scheduler testing framework is hopefully soon in place as well which makes testing this a bit easier: https://github.com/aiidateam/aiida-core/issues/3805.
As far as I can see, mpi currently cannot work because:
bash submitscript
(where the executable name in the submit script has been replaced), i.e. running mpirun inside mpirun
https://github.com/aiidateam/aiida-testing/blob/8fd5b7e4fc59256770d2c66690edd8820b36a3b1/aiida_testing/mock_code/_cli.py#L49-L50I have opened https://github.com/aiidateam/aiida-testing/pull/47 that simply replaces these two lines by a direct call to the remote executable, which makes it possible to run executables through mpirun (withmpi=True
) - but only on a single MPI process.
In order for the same approach to work with multiple processes, mock-code
needs to become a parallel program, with MPI rank 0 doing all the hashing & storing, and calling the remaining MPI ranks only for the executable.
I played a bit with mpi4py
in order to do this (see this branch), but my impression is that this makes things significantly more complicated and difficult to test.
It's not clear to me whether it is necessary to introduce this complexity, and perhaps there are easier ways, like getting AiiDA to always run the mock code without mpi, and forward the request to do MPI via an environment variable.
@greschd Let me know what you think.
Hmm, I think the best solution here depends on what comes of our discussion to hook into AiiDA before the code is actually being called (in #43).
If the mock-code doesn't actually need to be a standalone executable, these problems would go away.
The code is currently tested with (and designed for) a direct, non-MPI calculation only. We should test if it works also with schedulers and MPI.