ICRAR / daliuge

The DALiuGE Execution Engine
GNU Lesser General Public License v2.1
26 stars 7 forks source link

Mpi fixes #288

Closed awicenec closed 1 week ago

awicenec commented 1 month ago

This PR covers three main aspects:

1) Extending the MPI component to act similarly to a BASH component when it comes to constructing the command line. It now uses the same logic to identify and replace ports, environment variables and arguments. The relevant code has been copied into the MPI component, but eventually it should be re-factored out into a utility library.

2) Extending the deploy configurations to include our Hyades cluster. This revealed a short-coming in that the deploy functions up to now assumed that the username on the remote system are the same as local, which is not always the case. Now it is possible to specify the remote username on the command line, if not specified the old logic applies.

3) An additional parameter EXEC_PREFIX has been added to the configuration to allow the configuration to specify the very first part of the command to be used in the sbatch script. That used to be hard-coded to srun, but in some cases (currently on Hyades!) srun does not work or some other command needs to be used.

myxie commented 1 month ago

Hi @awicenec - it looks like these changes have been implemented on an older DALiuGE repo before the git surgery that you performed to remove the venv information, as there are >4,500 commits in the history. I think you might need to move the changes to a 'clean' local repo?

coveralls commented 1 month ago

Coverage Status

coverage: 79.638% (+0.005%) from 79.633% when pulling 7f6e3b8618a458f5f38293a9ad24a81877611001 on mpi_fixes into d075337a851f0577761d4e548170b08863eb54ae on master.

myxie commented 1 month ago

@awicenec I think you might have missed my first comment:

it looks like these changes have been implemented on an older DALiuGE repo before the git surgery that you performed to remove the venv information, as there are >4,500 commits in the history

image

This is similar to previous PRs that were based on the pre-surgery repository (e.g. https://github.com/ICRAR/daliuge/pull/265)

myxie commented 1 week ago

@awicenec I will close this now #289 has been merged.