NOAA-GSL / ExascaleWorkflowSandbox

Other
2 stars 2 forks source link

test_run_mpi_pi doesn't verify the tasks run concurrently #46

Closed christopherwharrop-noaa closed 8 months ago

christopherwharrop-noaa commented 9 months ago

The test_run_mpi_pi test in test_parsl_flux_mpi_hello.py ensures that two runs of the mpi_pi.exe program do not share nodes. However, the test does not actually verify that the tests run concurrently.

The output of the tasks is written in parsl_flux_mpi_pi1_run.out and parsl_flux_mpi_pi2_run.out. This output includes time stamps for the start and end times of the run. In order to prove that the two tests run concurrently, two additional assertions are needed. We need to guarantee that the start time of pi1 is not greater than the end time of pi2 and that the start time of pi2 is not greater than the end time of pi1.

An additional problem is that the time stamps only include HH:MM:SS. If the execution of the code spans midnight, then the validation of concurrent execution times will not work. The Fortran code needs to be updated to write out the full time stamp such that it includes the date as well as the time.

NaureenBharwaniNOAA commented 8 months ago

A couple questions here:

  1. I'm not seeing any output files for parsl_flux_mpi_pi1_run.out or parsl_flux_mpi_pi2_run.out. Are these files automatically removed? I see in test_parsl_flux_mpi_hello.py in test_run_mpi_pi you are testing them against each other, but don't see them in the repository. I'm on the main branch. Is there a way to get to these files and see the contents?
  2. You mentioned that the timestamps need to be updated as well to include the full timestamp with dates in addition to HH:MM:SS. Could you point me to the Fprtran code that needs to be updated? I think an appropriate place would be in mpi_pi.f90.

My idea would be to start with the timestamp first, ensure that is outputting the correct timestamp with the date included in each of parsl_flux_mpi_pi1_run.out and parsl_flux_mpi_pi2_run.out. Then continue on to the confirmation that the tests run concurrently.

christopherwharrop-noaa commented 8 months ago

The output files parsl_flux_mpi_pi1_run.out and parsl_flux_mpi_pi2_run.out should be produced when you run the tests.

The code that needs updating is mpi_pi.f90.

I agree. Fix the time stamp first because you have to have that before you can do anything else.