Feature 2717 convert unit.pl to unit.py

natalieb-noaa commented 5 months ago

Expected Differences

[x] Do these changes introduce new tools, command line arguments, or configuration file options? [No]
If yes, please describe:
[x] Do these changes modify the structure of existing or add new output data types (e.g. statistic line types or NetCDF variables)? [No]
If yes, please describe:

Pull Request Testing

[x] Describe testing already performed for these changes:
Ran tests from 15 unit test xml files and compared results/outputs to running the same tests using the perl script. Also, tested each of the non-default options on at least 1 test.
[x] Recommend testing for the reviewer(s) to perform, including the location of input datasets, and any additional instructions:
The process/syntax for running tests with the python script is essentially the same as perl. You'll need to set env variables, e.g.:
```
export MET_BUILD_BASE=/d1/projects/MET/MET_regression/develop/NB20240428/MET-develop
export MET_TEST_BASE=$MET_BUILD_BASE/internal/test_unit
export MET_BASE=$MET_BUILD_BASE/share/met
export MET_TEST_INPUT=/d1/projects/MET/MET_test_data/unit_test
export MET_TEST_OUTPUT=/path/to/output/directory
export MET_TEST_RSCRIPT=/nrit/ral/bin/Rscript
```
And a test can be run using the command:
[python exec] /path/to/repo/MET/internal/test_unit/python/unit.py [options, -h for details]
[x] Do these changes include sufficient documentation updates, ensuring that no errors or warnings exist in the build of the documentation? [NA]
[x] Do these changes include sufficient testing updates? [NA]
[x] Will this PR result in changes to the MET test suite? [Yes]
This PR contains a new directory and new python module that are part of the test execution. However, this PR does not yet change how tests are run.
[x] Will this PR result in changes to existing METplus Use Cases? [No]
If yes, create a new Update Truth METplus issue to describe them.
[x] Do these changes introduce new SonarQube findings? [No]
If yes, please describe:
[ ] Please complete this pull request review by [Fill in date].

Pull Request Checklist

See the METplus Workflow for details.

[x] Review the source issue metadata (required labels, projects, and milestone).
[x] Complete the PR definition above.
[x] Ensure the PR title matches the feature or bugfix branch name.
[x] Define the PR metadata, as permissions allow. Select: Reviewer(s) and Development issue Select: Milestone as the version that will include these changes Select: Coordinated METplus-X.Y Support project for bugfix releases or MET-X.Y.Z Development project for official releases
[ ] After submitting the PR, select the :gear: icon in the Development section of the right hand sidebar. Search for the issue that this PR will close and select it, if it is not already selected.
[ ] After the PR is approved, merge your changes. If permissions do not allow this, request that the reviewer do the merge.
[ ] Close the linked issue and delete your feature or bugfix branch from GitHub.

georgemccabe commented 5 months ago

I made a few fixes in your branch to fix a few errors I encountered. There were a few minor inconsistencies in a few of the XML files (exists used instead of exist, test_dir not included) that were just ignored in the perl version but caused a crash in the python version. This is good because it caught some bugs where tests weren't doing what they were supposed to do!

UPDATE: I found/fixed the issue. There was a tab at the end of the line before the output file line in the XML file.

I ran into a weird issue that I can't figure out what is happening. When I run:

./python/unit.py ./xml/unit_ref_config_lead_12.xml

TEST: pcp_combine_wrf_3hr_09_12 fails. I see that the command is output correctly, but the log from MET shows that the output file path is not included.

natalieb-noaa commented 5 months ago

This looks great! I made a few suggestions in-line. I have not yet run this locally, but I plan to do that shortly. Also, I think it would be good to update the automated tests to run using this new version so we can review any differences through the GitHub Actions workflow run for this PR. I can make those changes to this branch.

Thanks for the feedback, updates, and fixes! I'll take a look at your comments and commits...

And if it'll be quick for you to update the automated tests to use this, please go ahead and do that! (EDIT: just saw that you already did, thanks!)

JohnHalleyGotway commented 4 months ago

I'm sorry for the very long delay on this.

I tried out these commands today on seneca in /d1/projects/MET/MET_pull_requests/met-12.0.0/beta5/MET-feature_2717_convert_unit.pl_to_unit.py.

There's a very minor difference in logging: OLD with unit.pl reports status on the same line:
```
TEST: ascii2nc_TRMM_3hr            - pass -  13.912 sec
```
NEW with unit.py puts them on separate lines:
```
TEST: ascii2nc_TRMM_3hr
- pass -    13.983 sec
```
This seldom used regression_runtimes.ksh script assumes the former. I looked into the logger terminator to omit the newline, but it's probably better to just update the logic of that runtimes script at some point or replace it with a Python version.

The PERL version accepts multiple input xml files to run:

perl/unit.pl xml/unit_ascii2nc.xml xml/unit_pb2nc.xml

The Python version fails with a parsing error:

python/unit.py -log py.log xml/unit_ascii2nc.xml xml/unit_pb2nc.xml
usage: unit.py [-h] [-log log_file] [-cmd] [-memchk] [-callchk] [-noexit] test_xml
unit.py: error: unrecognized arguments: xml/unit_pb2nc.xml

Can this logic be updated to support multiple XML input files in a single call?

I tested the -cmd, -memchk, -callchk, and -noexit options and they all work as I'd expect.
The -log option also work. However, I note that if the log file already exists, it is appended rather than clobbered. I can't really decide what I prefer. I'm used to MET clobbering it's output files, including log files, rather than appending to them.

How do you think this should behave? Should we be appending or replacing?

natalieb-noaa commented 4 months ago

Thanks for the feedback, @JohnHalleyGotway !

Right. I could use print statements instead of logging if we wanted the status output to look the same. Using logging feels like better programming practice to me (more robust and flexible) as this library continues to get updated, but I wasn't sure if the format change would be a problem for anyone. If this change only causes a problem for a seldom-used script, then I'd be inclined to keep it as is and just update that script, as you suggested.
Good catch! That didn't even occur to me. I should be able to update it to allow more than one input xml file.
I think overwriting the log files would be reasonable. I can't think of a reason to continually append, especially if that's not what you're used to. Also, if we keep appending, then we'll have to eventually handle the files getting too large. I'll make the change to overwrite the log files.

natalieb-noaa commented 3 months ago

@JohnHalleyGotway I made the changes you suggested:

allow for >1 xml file to be passed when calling the script
if writing to an existing log file, replace rather than append

dtcenter / MET