test_mpi_hello does not validate output

christopherwharrop-noaa commented 9 months ago

The test_mpi_hello test in tests/test_parsl_flux_mpi_hello.py only checks that the Parsl App completed with a status of 0 (meaning success). It does not actually check whether the output of the task, written to parsl_flux_mpi_hello_run.out is correct. This test needs to be updated to validate the output. This is not as straightforward as simply asserting the file contents are equal to a known string because the output will be slightly different depending on which platform is used to run the tests. Furthermore, on some platforms, the correct results will vary from run to run because of differences in the hosts acquired by the Parsl pilot job.

When running the test in the container (use the chiltepin.yaml configuration) the output will look something like this:

Hello world from host slurmnode1, rank 0 out of 6
Hello world from host slurmnode1, rank 1 out of 6
Hello world from host slurmnode2, rank 2 out of 6
Hello world from host slurmnode2, rank 3 out of 6
Hello world from host slurmnode3, rank 4 out of 6
Hello world from host slurmnode3, rank 5 out of 6

However, when running the test on an on-prem HPC, such as Hercules, the output will look something like this:

Hello world from host hercules-05-54, rank 0 out of 6
Hello world from host hercules-05-54, rank 1 out of 6
Hello world from host hercules-05-55, rank 2 out of 6
Hello world from host hercules-05-55, rank 3 out of 6
Hello world from host hercules-05-56, rank 4 out of 6
Hello world from host hercules-05-56, rank 5 out of 6

Note that the only differences are the hostnames in question. This is because the code runs on different hosts depending on the platform and depending on which hosts were acquired by the Parsl pilot job. The tests needs to be updated to validate the output while taking into consideration that the names of the hosts can vary.

NaureenBharwaniNOAA commented 8 months ago

The merge for the PR addressing Issue #44 didn't close as we liked it to. Does it take a bit for this to take affect or should this be instantaneous? I added the Closes Issue #44 later after the original creation of the PR.

christopherwharrop-noaa commented 8 months ago

I'm not sure. It should be pretty instantaneous. The could it be because you did "Closes Issue #44" instead of "Closes #44"? I don't know. I'll look into it. Meanwhile, go ahead and close this manually.

christopherwharrop-noaa commented 8 months ago

Yeah, I'm pretty sure you need "closes #44" and not "closes issue #44". You can also use "fixes" instead of "closes" if you want. If there are multiple issues, then you need "closes" before each one (i.e. closes #1, closes #2, closes #3).

NaureenBharwaniNOAA commented 8 months ago

Yeah, I'm pretty sure you need "closes #44" and not "closes issue #44". You can also use "fixes" instead of "closes" if you want. If there are multiple issues, then you need "closes" before each one (i.e. closes #1, closes #2, closes #3).

I'll try this on my next issue!

NOAA-GSL / ExascaleWorkflowSandbox

test_mpi_hello does not validate output #44