Python test script does not note occurrence of outputs that are not in the test dataset (Version check?)

CFMIP / COSPv2.0

COSP - The CFMIP (Cloud Feedbacks Model Intercomparison Project) Observation Simulator Package

41 stars 38 forks source link

Python test script does not note occurrence of outputs that are not in the test dataset (Version check?) #43

Closed roj222 closed 4 years ago

roj222 commented 4 years ago

cosp2_test is currently producing outputs which are not in the test dataset (see issue #42). The python test script did not report the presence of these additional outputs. I suggest that it be modified to check that the number of variables in the reference and test_output are the same.

And perhaps it would be good to add some versioning check (i.e. have cosp2_test output a version number into the data file and thereby have the python comparison script check this number against what is in the reference dataset to make sure both are up to date).

RobertPincus commented 4 years ago

@roj222 I agree that the comparison script and especially the continuous integration should report an error if the set of variables in the test output is not consistent with the reference output. @alejandrobodas is this something you might be able to manage?

@roj222 the reference data is version-controlled by virtue of being in the repository. What other versioning do you imagine?

alejandrobodas commented 4 years ago

This has been fixed by pull request #40, but only if the new python test script is used:

python cosp2_test_01.py data/outputs/UKMO/cosp2_output_um.nc data/outputs/UKMO/cosp2_output_um.ref.nc

You should see an output similar to the the step 'Compare output against KGO' in https://github.com/CFMIP/COSPv2.0/runs/645660411

@roj222 please can you confirm that you get that when you run the new script?

If this works as expected, then we should change the readme file to point users to the new script for their local tests. We also need to update the reference file cosp2_output_um.ref.nc so that it contains all the output variables. Given that we now have 2 reference datasets generated with 2 different compilers (ifort and gfortran), I propose to rename the ifort reference dataset to cosp2_output_um.ifort.kgo.nc.

RobertPincus commented 4 years ago

@alejandrobodas Why would we not simply replace the results-checking script with one that fails when the variables are not the same? Under what circumstances would this be acceptable?

With respect to different compilers, it seems to me that we could provide a single set of scripts using results from the CI, and suggest that people use a higher tolerance for failure when using other compilers.

alejandrobodas commented 4 years ago

@RobertPincus it's not entirely clear to me what you mean here. The new script exits with a failure code when the variables are not the same, i.e. it fails the continuous integration test, although it still prints the comparison table. I think we need this, because when you are adding a new diagnostic the test will fail, but you still want to know that the other variables are not broken. Or are you proposing something different? Perhaps the output information needs to be more explicit so that it's easier to see when it's failed.

Regarding your second point, I like your suggestion. We could check for strict bit-comparability in the CI test, and relax tolerance in local tests. That would simplify things. It would only involve a small change in the readme file.

RobertPincus commented 4 years ago

@alejandrobodas Basically, I’m wondering why you left the old script in place rather than replacing it with the new script. The difference, as I understand it, is that the old script compares variables present in both test and reference, while the new script fails if the list of variables is not the same. I can’t see why we need the first/old script any more, and it has the disadvantage that users (in this case @roj222) are using it accidentally.

alejandrobodas commented 4 years ago

@RobertPincus Thanks for the clarification, I fully agree. I left the old script to avoid changing too many things at once, but that has obviously caused more confusion! I'll create a pull request that addresses this.