VI4IO / io500-app

Development version of the new IO-500 Application
MIT License
18 stars 11 forks source link

can't launch a parallel external find #21

Closed mchaarawi closed 4 years ago

mchaarawi commented 4 years ago

with this commit: https://github.com/VI4IO/io500-app/commit/c155836d31c12b3dee2640e5001adfb32085b6a7 it's not possible anymore to execute an external parallel find application. using a serial find in the io-500 workflow is unrealistic for any valid score.

Contrary to the comment made in the patch, this was working fine before the change was landed, so we would appreciate if this change was reverted.

JulianKunkel commented 4 years ago

The reason for this was that typically MPI apps don't support to start again a MPI parallel app by them. MPI spawn however isn't quite supported. If you like to use a mpi parallel find either use the integrated one or provide one that implements the same interface. If your environment supports the parallel execution as like with srun before let us know. It is important to find out for which impl. of MPI this works or doesn't.

Mohamad Chaarawi notifications@github.com schrieb am Mo., 15. Juni 2020, 22:11:

with this commit: c155836 https://github.com/VI4IO/io500-app/commit/c155836d31c12b3dee2640e5001adfb32085b6a7 it's not possible anymore to execute an external parallel find application. using a serial find in the io-500 workflow is unrealistic for any valid score.

Contrary to the comment made in the patch, this was working fine before the change was landed, so we would appreciate if this change was reverted.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/VI4IO/io500-app/issues/21, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGW5SUZISZXK3E2AEJPJALRW2E6PANCNFSM4N6YHUZA .

mchaarawi commented 4 years ago

We are using the mpifileutils dfind, which is a pretty well known implementation and used by many others. it has the same interface as the pfind tool but it requires additional arguments like IOR/mdtest for the backend API.

i did not see any issues when testing with MPICH, MVAPICH, and latest OpenMPI. so im not sure what version for you was not working.

adilger commented 4 years ago

Mohamad, is it possible to use external_extra_args for to make this work instead of external_mpi_args? Are the extra argumets needed for MPI or for dfind? I don't have a strong opinion either way, but it sounds like the additional arguments you need should be for dfind.

mchaarawi commented 4 years ago

Andreas, external_mpi_args are MPI args, not dfind args. so one would pass: external_mpi_args = mpirun -np 10 --hostfile file --mca .. so your suggestion wouldn't work.

JulianKunkel commented 4 years ago

Ken had this issue with whom I did debug the issue. I could reproduce the issue with OpenMPI 4.0.3 in the new Ubuntu. It is a tricky problem. Added an extra note.