dtcenter / MET

Model Evaluation Tools
https://dtcenter.org/community-code/model-evaluation-tools-met
Apache License 2.0
74 stars 22 forks source link

Feature 2842 ugrid config #2852

Closed hsoh-u closed 2 months ago

hsoh-u commented 3 months ago

Expected Differences

Changed -config option to -ugrid_config to specify that the configurations are for UGrid, not PointStat nor GridStat

Pull Request Testing

Added man unit test

export BEG_DS='-1800'
export CONFIG_DIR='${MET_TEST_BASE}/config'
export END_DS='1800'
export OUTPUT_PREFIX='UGRID_MPAS_CFG_OUT_TEMP'
/d1/personal/hsoh/git/features/feature_2842_ugrid_config/MET/share/met/../../bin/point_stat \
      /d1/projects/MET/MET_test_data/unit_test/ugrid_data/mpas/mpasout.2012-04-09_12.00.00_reduced.nc \
      /d1/personal/hsoh/MET/test_output/feature_2842_ugrid_config/pb2nc/gdas1.20120409.t12z.prepbufr.nc \
      /d1/personal/hsoh/git/features/feature_2842_ugrid_config/MET/internal/test_unit/config/PointStatConfig_ugrid_no_dataset \
      -ugrid_config /d1/personal/hsoh/git/features/feature_2842_ugrid_config/MET/share/met/config/UGridConfig_mpas \
      -outdir /d1/personal/hsoh/MET/test_output/feature_2842_ugrid_config/point_stat_ugrid -v 1
unset BEG_DS
unset CONFIG_DIR
unset END_DS
unset OUTPUT_PREFIX

It will done by #2748

An unittest is added

Difference 1. The distance checking is disabled because the max_distance_in_km was removed from the configuration files. So, no UGrid data are filtered by the distance. The output will be different.

8 More output files at $MET_TEST_OUTPUT/point_stat_ugrid directory.

point_stat_UGRID_MPAS_CFG_OUT_TEMP_000000L_20120409_120000V_cnt.txt
point_stat_UGRID_MPAS_CFG_OUT_TEMP_000000L_20120409_120000V_ctc.txt
point_stat_UGRID_MPAS_CFG_OUT_TEMP_000000L_20120409_120000V_cts.txt
point_stat_UGRID_MPAS_CFG_OUT_TEMP_000000L_20120409_120000V_eclv.txt
point_stat_UGRID_MPAS_CFG_OUT_TEMP_000000L_20120409_120000V_fho.txt
point_stat_UGRID_MPAS_CFG_OUT_TEMP_000000L_20120409_120000V_mpr.txt
point_stat_UGRID_MPAS_CFG_OUT_TEMP_000000L_20120409_120000V.stat
point_stat_UGRID_MPAS_CFG_OUT_TEMP_000000L_20120409_120000V_vcnt.txt

Maybe.

Pull Request Checklist

See the METplus Workflow for details.

JohnHalleyGotway commented 3 months ago

This PR has been assigned to the beta5 development cycle. Converting it to draft until work can resume on it.

DanielAdriaansen commented 3 months ago

@hsoh-u two questions: 1) Is the current UGRID implementation for forecast files only, or could a UGRID file be used for the observation file also? 2) Can you describe what attributes the auto-detection of UGRID looks for in the lfric and mpas case so I can document it?

hsoh-u commented 3 months ago

@hsoh-u two questions:

1. Is the current UGRID implementation for forecast files only, or could a UGRID file be used for the observation file also?

2. Can you describe what attributes the auto-detection of UGRID looks for in the `lfric` and `mpas` case so I can document it?

It's designed to be a fcst. But it's not blocked as an obs at grid_stat.

Auto detection: https://github.com/dtcenter/MET/blob/develop/src/libcode/vx_data2d_factory/is_netcdf_file.cc#L209

DanielAdriaansen commented 3 months ago

@hsoh-u are there any special requirements for the ugrid_coordinates_file? Does it have to be CF-compliant per MET's existing NETCDF support documentation here: https://met.readthedocs.io/en/latest/Users_Guide/data_io.html#requirements-for-cf-compliant-netcdf, or are these files different and need their own documentation describing what fields and information needs to be inside for MET UGRID to work properly? I think the ugrid_metadata_map relates to this coordinates_file, so maybe it's just simply enough to say in the documentation, that "each value in the ugrid_metadata_map must exist as a variable inside ugrid_coordinates_file"?

DanielAdriaansen commented 3 months ago

@hsoh-u I am still confused why ugrid_max_distance_km exists in these two files: https://github.com/dtcenter/MET/blob/feature_2842_ugrid_config/data/config/UGridConfig_mpas https://github.com/dtcenter/MET/blob/feature_2842_ugrid_config/data/config/UGridConfig_lfric

Since we are hard-coding the values, can we remove this? I don't think people are going to be editing these files, as they exist in a shared area. They will likely copy them to their own area if they need to modify them, and then use -ugrid_config, but I think they should instead override this item in their PointStat/GridStat config files instead, is that right? Recommended best practice for setting ugrid_max_distance_km is to do it in the user PointStat/GridStat config files?

hsoh-u commented 3 months ago

No, it does not check if it's a CF compliant NetCDF. MET gets dimension names and variable names from ugrid_metadata_map. If ugrid_coordinates_file is given, MET reads those variables from ugrid_coordinates_file only if they exist. MET does the same thing with the UGrid data file and overrides them. If coordinate variables (like, lat/lon/time) do not exist at the data file, they came from the ugrid_coordinates_file. If the time variable exists at the both files, MET uses the time variable from the data file, not from ugrid_coordinates_file. And MET checks the dimension of lat/lon from ugrid_coordinates_file and the dimension from the data variable.

DanielAdriaansen commented 3 months ago

No, it does not check if it's a CF compliant NetCDF.

Thanks. But it does have to be netcdf format, correct?

And MET checks the dimension of lat/lon from ugrid_coordinates_file and the dimension from the data variable.

Is this just to ensure the lat/lon dimensions are the same (size/shape) in both files? As a sanity check that the coordinates file matches the data file?

hsoh-u commented 3 months ago

@hsoh-u I am still confused why ugrid_max_distance_km exists in these two files: https://github.com/dtcenter/MET/blob/feature_2842_ugrid_config/data/config/UGridConfig_mpas https://github.com/dtcenter/MET/blob/feature_2842_ugrid_config/data/config/UGridConfig_lfric

Since we are hard-coding the values, can we remove this? I don't think people are going to be editing these files, as they exist in a shared area. They will likely copy them to their own area if they need to modify them, and then use -ugrid_config, but I think they should instead override this item in their PointStat/GridStat config files instead, is that right? Recommended best practice for setting ugrid_max_distance_km is to do it in the user PointStat/GridStat config files?

Yes, the best practice for setting ugrid_max_distance_km is to do it in the user PointStat/GridStat config files. Without it, two options: 1. give error message and stop or 2. apply the hard-coded max-distance and continue. The MET user can not change it. But a system admin can change the default max distance instead of re-building the MET.

hsoh-u commented 3 months ago

Thanks. But it does have to be netcdf format, correct?

Yes, only NetCDF format is accepted for ugrid_coordinates_file and data file, too. The checking is not required because of ugrid_metadata_map except auto detection (checks "Conventions" global attribute).

Is this just to ensure the lat/lon dimensions are the same (size/shape) in both files? As a sanity check that the coordinates file matches the data file?

Yes, that's correct.

DanielAdriaansen commented 3 months ago

Yes, the best practice for setting ugrid_max_distance_km is to do it in the user PointStat/GridStat config files. Without it, two options: 1. give error message and stop or 2. apply the hard-coded max-distance and continue. The MET user can not change it. But a system admin can change the default max distance instead of re-building the MET.

  1. Where in the code is ugrid_max_distance_km hard-corded?
  2. I think it is OK to use the default hard-coded value, but could we add a INFO debug level...5? message for the user saying this? Maybe we report what it is regardless of whether it's using the hard-coded value or not.
  3. Why does this need to exist in UgridConfig_mpas and UgridConfig_lfric? Is that just if we want to deliver a pre-configured value for those models? @willmayfield what do you think about this? Should there be a global value (hard-coded) that MET always uses no matter what, unless the user sets ugrid_max_distance_km in their config file? Or should we deliver a pre-configured value for mpas and lfric in those config files? I think I prefer to hard-code and document a default value, and only allow over-riding from the User PointStat/GridStat Config file. I don't think it should be set also in UGridConfig_mpas and UGridConfig_lfric, but I am not completely opposed. I just think the fewer places things live the better.
DanielAdriaansen commented 3 months ago

What do you mean by this @hsoh-u?

The checking is not required because of ugrid_metadata_map

Do you mean checking if the file type is netcdf? I dont follow how that relates to ugrid_metadata_map.

hsoh-u commented 3 months ago

What do you mean by this @hsoh-u?

The checking is not required because of ugrid_metadata_map

Do you mean checking if the file type is netcdf? I dont follow how that relates to ugrid_metadata_map.

Only CF compliant NetCDf files are accepted as UGrid input. MET has logic to find the coordinate dimensions and variables by using CF conventions. MET gets them from ugrid_metadata_map instead of finding them by using CF conventions.

DanielAdriaansen commented 3 months ago

What do you mean by this @hsoh-u?

The checking is not required because of ugrid_metadata_map

Do you mean checking if the file type is netcdf? I dont follow how that relates to ugrid_metadata_map.

Only CF compliant NetCDf files are accepted as UGrid input. MET has logic to find the coordinate dimensions and variables by using CF conventions. MET gets them from ugrid_metadata_map instead of finding them by using CF conventions.

OK I understand now, I thought you meant "checking for .nc file type" but you were talking about checking for CF-compliance specifically. Since we use the ugrid_metadata_map it just looks for those variables and doesn't need to expect any specific variables to be present for compliance. Thanks Howard!

hsoh-u commented 3 months ago

Correction: The CF compliant is required only for the auto detection of FileType, FILETYPE_UGRID. Any NetCDF files can be UGrid input by overriding the FileType to FILETYPE_UGRID with proper ugrid_metadata_map configuration.

DanielAdriaansen commented 2 months ago

Per Slack message, from Howard, hard-coded value for ugrid_max_distance_km is here, which is -9999

https://github.com/dtcenter/MET/blob/develop/src/libcode/vx_grid/unstructured_grid.cc#L339

DanielAdriaansen commented 2 months ago

Correction: The CF compliant is required only for the auto detection of FileType, FILETYPE_UGRID. Any NetCDF files can be UGrid input by overriding the FileType to FILETYPE_UGRID with proper ugrid_metadata_map configuration.

CF compliance is required for BOTH data file and coordinates file for auto-detection, correct? If one or both are NOT CF-compliant, the user must set FILETYPE_UGRID?

DanielAdriaansen commented 2 months ago

OK I think that ugrid_max_distance_km needs to be removed from UGridConfig_lfric and UGridConfig_mpas. Those files should only contain ugrid_metadata_map.

Here is an example:

DEBUG 6: UGridFile::read_config() configuration from /d1/personal/hsoh/git/pull_request/MET_feature_2842_ugrid_config/share/met/config/UGridConfig_lfric (/d1/personal/hsoh/git/pull_request/MET_feature_2842_ugrid_config/share/met/config/UGridConfig_lfric)
DEBUG 9: process_command_line() -> ugrid_coordinate_nc: /home/dadriaan/projects/ugrid/data/fcst/lfric/lfric_ver_20210519_0000.nc  ugrid_max_distance_km: -9999
DEBUG 9: UnstructuredGrid::latlon_to_xy() input=(-100.3499985, -0.8799999952) ==> (-1, 0) == (-100.546875, -0.6912475824) distance= 30.32702235km, 0.2727413423 degree, rejected

In this case, I did not set ugrid_max_distance_km in my User config file, it reported it was loading from UGridConfig_lfric, then it told me it was set to -9999, but then stations >= 30km away were filtered out.

This also tells me that currently the value of the max_distance in UGridConfig_lfric is not being stored before the DEBUG statement is printed, and the hard-coded value of -9999 is being printed instead of 30km. So if we DO decide that this setting should remain in UGridConfig_lfric, we must fix the code so it properly reports what it is using.

JohnHalleyGotway commented 2 months ago

@hsoh-u and @DanielAdriaansen, this PR caused the Nightly Build to fail when flagging these new output files added by this PR. Here's the list of files followed by a line count in each:

point_stat_UGRID_MPAS_CFG_OUT_TEMP_000000L_20120409_120000V_cnt.txt 2
point_stat_UGRID_MPAS_CFG_OUT_TEMP_000000L_20120409_120000V_ctc.txt 2
point_stat_UGRID_MPAS_CFG_OUT_TEMP_000000L_20120409_120000V_cts.txt 2
point_stat_UGRID_MPAS_CFG_OUT_TEMP_000000L_20120409_120000V_eclv.txt 2
point_stat_UGRID_MPAS_CFG_OUT_TEMP_000000L_20120409_120000V_fho.txt 2
point_stat_UGRID_MPAS_CFG_OUT_TEMP_000000L_20120409_120000V_mpr.txt 990
point_stat_UGRID_MPAS_CFG_OUT_TEMP_000000L_20120409_120000V.stat 995
point_stat_UGRID_MPAS_CFG_OUT_TEMP_000000L_20120409_120000V_vcnt.txt 1

I'd say what we're really doing here is demonstrating the ability to generate matched pairs. So all we really care about is the MPR and perhaps SL1L2 line types. There probably isn't anything in these changes that impacts the categorical output in any way - so no need to write those output line types, esp since the threshold value being used is NOT MEANINGFUL (e.g. cat_thresh = NA in the output!). To keep the number of new output files to a minimum, I'd recommend that we revise internal/test_unit/config/PointStatConfig_ugrid_no_dataset as follows:

output_flag = {
   fho    = NONE;
   ctc    = NONE;
   cts    = NONE;
   mctc   = NONE;
   mcts   = NONE;
   cnt    = NONE;
   sl1l2  = STAT;
   sal1l2 = NONE;
   vl1l2  = NONE;
   val1l2 = NONE;
   vcnt   = NONE;
   pct    = NONE;
   pstd   = NONE;
   pjc    = NONE;
   prc    = NONE;
   ecnt   = NONE;
   orank  = NONE;
   rps    = NONE;
   eclv   = NONE;
   mpr    = STAT;
   seeps  = NONE;
   seeps_mpr = NONE;
}

To just write MPR and SL1L2 line types to a single .stat output file.

Does that make sense?

DanielAdriaansen commented 2 months ago

@JohnHalleyGotway I concur we probably only need to write MPR/SL1L2 for the test. @hsoh-u can you make those changes? Thanks.

hsoh-u commented 2 months ago

The same changes at GridStatConfig files, too. Rright?

JohnHalleyGotway commented 2 months ago

@hsoh-u, yes, in general, I think we should minimize the number of output files written down to only the ones that are really needed. So generally, better to write to just STAT rather than BOTH... unless there's a good reason for both.

For this PR, I only saw the updated PointStat config file for the unit tests, but if similar GridStat config files are present and their output could be trimmed down to the useful subset, then yes, that'd be good.