Closed hsoh-u closed 2 months ago
This PR has been assigned to the beta5 development cycle. Converting it to draft until work can resume on it.
@hsoh-u two questions:
1) Is the current UGRID implementation for forecast files only, or could a UGRID file be used for the observation file also?
2) Can you describe what attributes the auto-detection of UGRID looks for in the lfric
and mpas
case so I can document it?
@hsoh-u two questions:
1. Is the current UGRID implementation for forecast files only, or could a UGRID file be used for the observation file also? 2. Can you describe what attributes the auto-detection of UGRID looks for in the `lfric` and `mpas` case so I can document it?
It's designed to be a fcst. But it's not blocked as an obs at grid_stat.
Auto detection: https://github.com/dtcenter/MET/blob/develop/src/libcode/vx_data2d_factory/is_netcdf_file.cc#L209
@hsoh-u are there any special requirements for the ugrid_coordinates_file
? Does it have to be CF-compliant per MET's existing NETCDF support documentation here: https://met.readthedocs.io/en/latest/Users_Guide/data_io.html#requirements-for-cf-compliant-netcdf, or are these files different and need their own documentation describing what fields and information needs to be inside for MET UGRID to work properly? I think the ugrid_metadata_map
relates to this coordinates_file
, so maybe it's just simply enough to say in the documentation, that "each value in the ugrid_metadata_map
must exist as a variable inside ugrid_coordinates_file
"?
@hsoh-u I am still confused why ugrid_max_distance_km
exists in these two files:
https://github.com/dtcenter/MET/blob/feature_2842_ugrid_config/data/config/UGridConfig_mpas
https://github.com/dtcenter/MET/blob/feature_2842_ugrid_config/data/config/UGridConfig_lfric
Since we are hard-coding the values, can we remove this? I don't think people are going to be editing these files, as they exist in a shared area. They will likely copy them to their own area if they need to modify them, and then use -ugrid_config
, but I think they should instead override this item in their PointStat/GridStat config files instead, is that right? Recommended best practice for setting ugrid_max_distance_km
is to do it in the user PointStat/GridStat config files?
No, it does not check if it's a CF compliant NetCDF.
MET gets dimension names and variable names from ugrid_metadata_map
. If ugrid_coordinates_file
is given, MET reads those variables from ugrid_coordinates_file
only if they exist. MET does the same thing with the UGrid data file and overrides them. If coordinate variables (like, lat/lon/time) do not exist at the data file, they came from the ugrid_coordinates_file
. If the time variable exists at the both files, MET uses the time variable from the data file, not from ugrid_coordinates_file
. And MET checks the dimension of lat/lon from ugrid_coordinates_file
and the dimension from the data variable.
No, it does not check if it's a CF compliant NetCDF.
Thanks. But it does have to be netcdf format, correct?
And MET checks the dimension of lat/lon from ugrid_coordinates_file and the dimension from the data variable.
Is this just to ensure the lat/lon dimensions are the same (size/shape) in both files? As a sanity check that the coordinates file matches the data file?
@hsoh-u I am still confused why
ugrid_max_distance_km
exists in these two files: https://github.com/dtcenter/MET/blob/feature_2842_ugrid_config/data/config/UGridConfig_mpas https://github.com/dtcenter/MET/blob/feature_2842_ugrid_config/data/config/UGridConfig_lfricSince we are hard-coding the values, can we remove this? I don't think people are going to be editing these files, as they exist in a shared area. They will likely copy them to their own area if they need to modify them, and then use
-ugrid_config
, but I think they should instead override this item in their PointStat/GridStat config files instead, is that right? Recommended best practice for settingugrid_max_distance_km
is to do it in the user PointStat/GridStat config files?
Yes, the best practice for setting ugrid_max_distance_km
is to do it in the user PointStat/GridStat config files. Without it, two options: 1. give error message and stop or 2. apply the hard-coded max-distance and continue.
The MET user can not change it. But a system admin can change the default max distance instead of re-building the MET.
Thanks. But it does have to be netcdf format, correct?
Yes, only NetCDF format is accepted for
ugrid_coordinates_file
and data file, too. The checking is not required because ofugrid_metadata_map
except auto detection (checks "Conventions" global attribute).Is this just to ensure the lat/lon dimensions are the same (size/shape) in both files? As a sanity check that the coordinates file matches the data file?
Yes, that's correct.
Yes, the best practice for setting
ugrid_max_distance_km
is to do it in the user PointStat/GridStat config files. Without it, two options: 1. give error message and stop or 2. apply the hard-coded max-distance and continue. The MET user can not change it. But a system admin can change the default max distance instead of re-building the MET.
ugrid_max_distance_km
hard-corded?INFO
debug level...5? message for the user saying this? Maybe we report what it is regardless of whether it's using the hard-coded value or not.UgridConfig_mpas
and UgridConfig_lfric
? Is that just if we want to deliver a pre-configured value for those models? @willmayfield what do you think about this? Should there be a global value (hard-coded) that MET always uses no matter what, unless the user sets ugrid_max_distance_km
in their config file? Or should we deliver a pre-configured value for mpas
and lfric
in those config files? I think I prefer to hard-code and document a default value, and only allow over-riding from the User PointStat/GridStat Config file. I don't think it should be set also in UGridConfig_mpas
and UGridConfig_lfric
, but I am not completely opposed. I just think the fewer places things live the better.What do you mean by this @hsoh-u?
The checking is not required because of
ugrid_metadata_map
Do you mean checking if the file type is netcdf? I dont follow how that relates to ugrid_metadata_map
.
What do you mean by this @hsoh-u?
The checking is not required because of
ugrid_metadata_map
Do you mean checking if the file type is netcdf? I dont follow how that relates to
ugrid_metadata_map
.
Only CF compliant NetCDf files are accepted as UGrid input. MET has logic to find the coordinate dimensions and variables by using CF conventions. MET gets them from ugrid_metadata_map
instead of finding them by using CF conventions.
What do you mean by this @hsoh-u?
The checking is not required because of
ugrid_metadata_map
Do you mean checking if the file type is netcdf? I dont follow how that relates to
ugrid_metadata_map
.Only CF compliant NetCDf files are accepted as UGrid input. MET has logic to find the coordinate dimensions and variables by using CF conventions. MET gets them from
ugrid_metadata_map
instead of finding them by using CF conventions.
OK I understand now, I thought you meant "checking for .nc
file type" but you were talking about checking for CF-compliance specifically. Since we use the ugrid_metadata_map
it just looks for those variables and doesn't need to expect any specific variables to be present for compliance. Thanks Howard!
Correction: The CF compliant is required only for the auto detection of FileType, FILETYPE_UGRID. Any NetCDF files can be UGrid input by overriding the FileType to FILETYPE_UGRID with proper ugrid_metadata_map
configuration.
Per Slack message, from Howard, hard-coded value for ugrid_max_distance_km
is here, which is -9999
https://github.com/dtcenter/MET/blob/develop/src/libcode/vx_grid/unstructured_grid.cc#L339
Correction: The CF compliant is required only for the auto detection of FileType, FILETYPE_UGRID. Any NetCDF files can be UGrid input by overriding the FileType to FILETYPE_UGRID with proper
ugrid_metadata_map
configuration.
CF compliance is required for BOTH data file and coordinates file for auto-detection, correct? If one or both are NOT CF-compliant, the user must set FILETYPE_UGRID?
OK I think that ugrid_max_distance_km
needs to be removed from UGridConfig_lfric
and UGridConfig_mpas
. Those files should only contain ugrid_metadata_map
.
Here is an example:
DEBUG 6: UGridFile::read_config() configuration from /d1/personal/hsoh/git/pull_request/MET_feature_2842_ugrid_config/share/met/config/UGridConfig_lfric (/d1/personal/hsoh/git/pull_request/MET_feature_2842_ugrid_config/share/met/config/UGridConfig_lfric)
DEBUG 9: process_command_line() -> ugrid_coordinate_nc: /home/dadriaan/projects/ugrid/data/fcst/lfric/lfric_ver_20210519_0000.nc ugrid_max_distance_km: -9999
DEBUG 9: UnstructuredGrid::latlon_to_xy() input=(-100.3499985, -0.8799999952) ==> (-1, 0) == (-100.546875, -0.6912475824) distance= 30.32702235km, 0.2727413423 degree, rejected
In this case, I did not set ugrid_max_distance_km
in my User config file, it reported it was loading from UGridConfig_lfric
, then it told me it was set to -9999
, but then stations >= 30km away were filtered out.
This also tells me that currently the value of the max_distance in UGridConfig_lfric
is not being stored before the DEBUG statement is printed, and the hard-coded value of -9999
is being printed instead of 30km
. So if we DO decide that this setting should remain in UGridConfig_lfric
, we must fix the code so it properly reports what it is using.
@hsoh-u and @DanielAdriaansen, this PR caused the Nightly Build to fail when flagging these new output files added by this PR. Here's the list of files followed by a line count in each:
point_stat_UGRID_MPAS_CFG_OUT_TEMP_000000L_20120409_120000V_cnt.txt 2
point_stat_UGRID_MPAS_CFG_OUT_TEMP_000000L_20120409_120000V_ctc.txt 2
point_stat_UGRID_MPAS_CFG_OUT_TEMP_000000L_20120409_120000V_cts.txt 2
point_stat_UGRID_MPAS_CFG_OUT_TEMP_000000L_20120409_120000V_eclv.txt 2
point_stat_UGRID_MPAS_CFG_OUT_TEMP_000000L_20120409_120000V_fho.txt 2
point_stat_UGRID_MPAS_CFG_OUT_TEMP_000000L_20120409_120000V_mpr.txt 990
point_stat_UGRID_MPAS_CFG_OUT_TEMP_000000L_20120409_120000V.stat 995
point_stat_UGRID_MPAS_CFG_OUT_TEMP_000000L_20120409_120000V_vcnt.txt 1
I'd say what we're really doing here is demonstrating the ability to generate matched pairs. So all we really care about is the MPR and perhaps SL1L2 line types. There probably isn't anything in these changes that impacts the categorical output in any way - so no need to write those output line types, esp since the threshold value being used is NOT MEANINGFUL (e.g. cat_thresh = NA in the output!). To keep the number of new output files to a minimum, I'd recommend that we revise
internal/test_unit/config/PointStatConfig_ugrid_no_dataset
as follows:
output_flag = {
fho = NONE;
ctc = NONE;
cts = NONE;
mctc = NONE;
mcts = NONE;
cnt = NONE;
sl1l2 = STAT;
sal1l2 = NONE;
vl1l2 = NONE;
val1l2 = NONE;
vcnt = NONE;
pct = NONE;
pstd = NONE;
pjc = NONE;
prc = NONE;
ecnt = NONE;
orank = NONE;
rps = NONE;
eclv = NONE;
mpr = STAT;
seeps = NONE;
seeps_mpr = NONE;
}
To just write MPR and SL1L2 line types to a single .stat output file.
Does that make sense?
@JohnHalleyGotway I concur we probably only need to write MPR/SL1L2 for the test. @hsoh-u can you make those changes? Thanks.
The same changes at GridStatConfig files, too. Rright?
@hsoh-u, yes, in general, I think we should minimize the number of output files written down to only the ones that are really needed. So generally, better to write to just STAT rather than BOTH... unless there's a good reason for both.
For this PR, I only saw the updated PointStat config file for the unit tests, but if similar GridStat config files are present and their output could be trimmed down to the useful subset, then yes, that'd be good.
Expected Differences
Changed
-config
option to-ugrid_config
Changed the order of the configuration when
-ugrid_config
option is givenmax_distance_in_km (30km) at UGridConfig_lfric and UGridConfig_mpas was deleted. Without configuration of max_distance_in_km, the distance checking for UGrid is disabled.
[x] Do these changes introduce new tools, command line arguments, or configuration file options? [Yes] If yes, please describe:
Changed
-config
option to-ugrid_config
to specify that the configurations are for UGrid, not PointStat nor GridStatPull Request Testing
Added man unit test
It will done by #2748
An unittest is added
Difference 1. The distance checking is disabled because the max_distance_in_km was removed from the configuration files. So, no UGrid data are filtered by the distance. The output will be different.
8 More output files at $MET_TEST_OUTPUT/point_stat_ugrid directory.
[x] Will this PR result in changes to existing METplus Use Cases? [No] If yes, create a new Update Truth METplus issue to describe them.
[x] Do these changes introduce new SonarQube findings? [No] If yes, please describe:
Maybe.
Pull Request Checklist
See the METplus Workflow for details.