dtcenter / MET

Model Evaluation Tools
https://dtcenter.org/community-code/model-evaluation-tools-met
Apache License 2.0
74 stars 22 forks source link

Feature #2395 TOTAL_DIR #2892

Closed JohnHalleyGotway closed 1 month ago

JohnHalleyGotway commented 1 month ago

Expected Differences

This PR adds the TOTAL_DIR column to the VL1L2, VAL1L2, and VCNT line types. It adds this column BEFORE all the stats previously added for #2395. Since these new stats have not been included in a public release, adding this in the current "middle" of the line type is fine. It also updates Stat-Analysis to use those counts when aggregating the stats from those line types that begin with DIR_.

Pull Request Testing

The code for this feature branch is compiled as the met_test user in seneca:/d1/projects/MET/MET_pull_requests/met-12.0.0/beta5/MET-feature_2395_TOTAL_DIR.

  1. Tested aggregating VL1L2 lines:

    bin/stat_analysis -lookin out/point_stat/point_stat_360000L_20070331_120000V_vl1l2.txt -vx_mask DTC165,DTC166 -interp_mthd NEAREST -job aggregate -line_type VL1L2 -dump_row dump_vl1l2.txt

    Input:

    TOTAL_DIR  DIR_ME  DIR_MAE    DIR_MSE
    685 -1.75317 47.69438 4272.77589
    2494 -4.9744  28.90333 1909.85557

    Output:

    TOTAL_DIR  DIR_ME  DIR_MAE    DIR_MSE
    3179 -4.2803 32.95236 2419.00952

    And these are the correct weighted averages.

  2. Tested printing summaries of these columns.

    bin/stat_analysis -lookin out/point_stat/point_stat_360000L_20070331_120000V.stat -vx_mask DTC165,DTC166 -interp_mthd NEAREST -job summary -line_type VCNT -column TOTAL,TOTAL_DIR,DIR_ME

    And it's parse the correct values from the correct columns.

The code for this feature branch is compiled as the met_test user in seneca:/d1/projects/MET/MET_pull_requests/met-12.0.0/beta5/MET-feature_2395_TOTAL_DIR.

Please review code changes, doc updates, and test as you see fit.

Updated the tables to describe the output line types.

I added to new jobs to an existing Stat-Analysis config file to demonstrate the processing of VL1L2 lines. This generates 2 new output files:

stat_analysis_ps/CONFIG_POINT_STAT_agg_vl1l2.stat
stat_analysis_ps/CONFIG_POINT_STAT_agg_stat_vl1l2_to_vcnt.stat

Differences are flagged in all VCNT, VL1L2, and VAL1L2 output lines.

I downloaded and inspected the diff and log artifacts from the GHA run.

  1. Confirmed there are no more warnings about 0-vectors.
  2. Diffs exist in 82 output files:
    • 48 .stat files and 34 .txt files.
    • 79 Point-Stat, 12 Grid-Stat, and 1 Stat-Analysis

I ran the commands listed below to confirm that all these diffs are limited to the VCNT, VL1L2, and VAL1L2 line types:

rm -f diff_line_types
for OUTPUT in `find ./ -type f | grep OUTPUT`; do
  TRUTH=`echo $OUTPUT | sed 's/OUTPUT/TRUTH/g'`;
  diff -b $TRUTH $OUTPUT | egrep "V12.0.0" | awk '{print $25}' | sort -u >> diff_line_types
done
cat diff_line_types | sort -u

Here's the result:

VAL1L2
VCNT
VL1L2

Differences are flagged in all VCNT, VL1L2, and VAL1L2 output lines. See this dtcenter/METplus#2592 issue to update the truth data.

The SonarQube quality gate flag failed in this run with 0 new bugs but 7 new code smells.

I did modify the code to address 4 of the 7 ones flagged (Global variables should be const. and Add the "static" storage specifier to this declaration).

But chose not to address the other 3 (Extract the assignment from this expression. and Refactor this function to reduce its Cognitive Complexity from XX to the 25 allowed.)

Pull Request Checklist

See the METplus Workflow for details.

JohnHalleyGotway commented 1 month ago

Thanks for taking a look @georgemccabe. I did just find/address a "false positive" match in the is_vector_dir_stat() function.

As for the diffs, yes, the rather brittle diff logic fails when the number of columns differ. Rather than enhancing the existing Rscript diff logic, I'd rather switch over to the Python-based one used by METplus soon.

JohnHalleyGotway commented 1 month ago

@bikegeek, FYI, I documented the METplus-Analysis impacts with dtcenter/METdataio#307 and dtcenter/METcalcpy#384.