NOAA-GSL / ExascaleWorkflowSandbox

Other
2 stars 2 forks source link

test_flux_resource_list does not validate output #43

Closed christopherwharrop-noaa closed 8 months ago

christopherwharrop-noaa commented 9 months ago

The test_flux_resource_list test in tests/test_parsl_flux_mpi_hello.py only checks that the Parsl App completed with a status of 0 (meaning success). It does not actually check whether the output of the task, written to parsl_flux_resource_list.out is correct. This test needs to be updated to validate the output. This is not as straightforward as simply asserting the file contents are equal to a known string because the output will be slightly different depending on which platform is used to run the tests. Furthermore, on some platforms, the correct results will vary from run to run because of differences in the hosts acquired by the Parsl pilot job.

When running the tests in the container (using the chiltepin.yaml configuration) the output will look something like this:

     STATE NNODES   NCORES    NGPUS NODELIST
      free      3       47        0 slurmnode[1-3]
 allocated      1        1        0 slurmnode3
      down      0        0        0 

However, when running on an on-prem HPC, such as Hercules, the output will look something like this:

     STATE NNODES   NCORES    NGPUS NODELIST
      free      3       59        0 hercules-05-[54-56]
 allocated      1        1        0 hercules-05-56
      down      0        0        0 

Note that there are two things that will vary from run to run and platform to platform. The first one is the number of free nodes (i.e. 47 and 59 for example). the second one is the name of the hosts in question (i.e. slurmnode* and hercules-* for example). The rest of the output is invariant from run to run and platform to platform.

We need to update the test such that it validates against the invariant output while allowing the varying portions to change. This will allow for portable validation and a more rigorous test.