dtcenter / MET

Model Evaluation Tools
https://dtcenter.org/community-code/model-evaluation-tools-met
Apache License 2.0
77 stars 24 forks source link

Add support for Multi-Variate MODE #1184

Closed dwfncar closed 2 years ago

dwfncar commented 5 years ago

A new MODE tool is to be created to perform arbitrary set operations (union, intersection, difference, etc.) on objects generated by MODE. The objects in general will be associated with different variables. Object attributes (centroid, axis, etc.) will be computed for the resultant objects. Matching and merging may be a later option. This tool is to be tested on fields relevant for dryline identification (gradient of temperature, gradient of specific humidity, etc.). The output format of the new tool will be identical to that of MODE.

Account Key: 2700042 WPC

Related METplus issue: dtcenter/METplus#1516

JohnHalleyGotway commented 3 years ago

Randy, here's some feedback for multi-var MODE.

Summary of recommended changes:

And here's the details: (1) unit_mode_multivar.xml eventually runs on dakota but only writes output to the current working directory. On dakota:

cd /d3/projects/MET/MET_development/MET-feature_1184_dryline/test
export MET_BASE=/d3/projects/MET/MET_development/MET-feature_1184_dryline/met/share/met
export MET_TEST_BASE=/d3/projects/MET/MET_development/MET-feature_1184_dryline/test
export MET_TEST_INPUT=/d3/projects/MET/MET_test_data/unit_test
export MET_TEST_OUTPUT=/d3/projects/MET/MET_development/MET-feature_1184_dryline/test_output
perl/unit.pl xml/unit_mode_multivar.xml

Here's the error:

ERROR  : write_obj_stats() -> unable to open stats output file "00/mode_r__300000L_20120410_180000V_060000A_obj.txt"

Try again using:

mkdir 00 01 02
perl/unit.pl xml/unit_mode_multivar.xml

That runs to completion but the test still fails because the output is in the current working directory instead of an output directory. That works on dakota, but DOES NOT work on my Mac laptop. Here's the error message there:

command = "cp -u /Volumes/d1/projects/MET/MET_unit_test/MET_test_input/model_data/mode_multivar/alpha_fcst.nc f_super.nc"
cp: illegal option -- u

Looks like that command isn't very portable.

(3) Writing intermediate mode output to directories named "00", "01", and "02" in the current working directory won't work in many cases. It'd be better to write those to the "-outdir" directory, being sure to create the 00 subdirectory before writing to it. And the default outdir should be the current working directory. But that means parsing -outdir in multivar mode but then reset it, as needed, when calling mode separately for each field.

(4) Lines 233 and 237 should be changed:

233 command << cs_erase << "cp -u " << fcst_filenames[0] << ' ' << fcst_super_nc_filename;
237 command << cs_erase << "cp -u " <<  obs_filenames[0] << ' ' <<  obs_super_nc_filename;

Using the input file for the structure of the output file would only work when the input files are MET NetCDF files, and nothing else (not GRIB1, GRIB2, other flavors of NetCDF, or python embedding). You could take this approach using one of the intermediate NetCDF output files from single var MODE, but it'd probably be easier to correctly configure the metadata contents by creating the NetCDF output file from scratch.

(5) Test and resolve any logging issues. What happens when you use the "-log" command line argument. It'd be passed to each of the individual calls to MODE. I would guess that those would clobber each other yielding truncated output. If you can figure out how to get all log output into the same file, please replaces couts with calls to mlog instead: In mode.cc:

   126  cout << "if singlevar mode:\n"
   129  cout << "if multivar mode:\n"
   138  cout << "\n\n"
   172  cout << "\n\n"

In multivar_frontend.cc:

   161  cout << "command = \"" << command << "\"\n" << flush;
   186  e->dump(cout);
   188  cout << "\n\n    var_name = \"" << var_name << "\"\n\n" << flush;
JohnHalleyGotway commented 3 years ago

Randy, I tested the code for this 1184 pull request but see that not all of the prior issues were fixed. It would appear that Multivariate MODE is still writing out to the current working directory instead of the output directory: 00 01 02 f_super.nc o_super.nc Did you perhaps make these changes on a local version and not push the changes up to GitHub?

JohnHalleyGotway commented 3 years ago

Randy, thanks for the recent commits for MODE multivar. I tried testing to confirm that everything's been fixed, but some issues do remain.

  1. Running "mode" with no arguments should just print the usage statement, but results in a segfault instead.
  2. While the "-outdir" argument does cause the 00, 01, and 02 subdirs to be written there, the "f_super.nc" and "o_super.nc" files remain in the current working directory.

I'll try to fix these problems on feature_1184 now.

JohnHalleyGotway commented 3 years ago

I modified log messages slightly and update mode multivar to write its output to the -outdir directory. However, the mode usage statement and mode --help continue to fail, and I wasn't able to fix them.

musial6:mode johnhg$ ./mode --help
ERROR  : 
ERROR  : recursive_envs() -> unable to open input file "LDFLAGS=-Wl,-rpath,/lib:/Volumes/d1/projects/MET/MET_external_libs/external_libs/lib:/Volumes/d1/projects/MET/MET_external_libs/external_libs/lib:/zlib-1.2.11/lib:/szip-2.1.1/lib -Wl,-rpath,:/Volumes/d1/projects/MET/MET_external_libs/external_libs/lib:/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib"
ERROR  : 
musial6:mode johnhg$ ./mode

*** Model Evaluation Tools (METV10.0.0) ***

if singlevar mode:
==================

Usage: mode
    fcst_file
    obs_file
    config_file
    [-config_merge merge_config_file]
    [-outdir path]
    [-log file]
    [-v level]
    [-compress level]

    where
        "fcst_file" is a gridded forecast file containing the field to be verified (required).
        "obs_file" is a gridded observation file containing the verifying field (required).
        "config_file" is a MODEConfig file containing the desired configuration settings (required).
        "-config_merge merge_config_file" overrides the default fuzzy engine settings for merging within the fcst/obs fields (optional).
Segmentation fault: 11
JohnHalleyGotway commented 3 years ago

On 12/18/20, Lindsay held a telecon to gather input from NOAA colleagues: Geoff M, Adam C, Burkley T, Mike E, and Jennifer T

Their general feedback was a desire to define the dryline objects much thinner, with an emphasis on capturing the leading (eastern) edge.

Jennifer: Consider including a specific humidity or dewpoint threshold in the super object definition.

Burkley: When holes appear in the dryline, check which of the 3 input parameters are not met that cause that hole. Is it consistently the same field? Adam Clark guesses that it's the temperature.

Geoff: Suspect that the 100km length attribute is too low. Focus on large, synoptic-scale drylines instead of smaller outflow boundaries. Also try verifying against RTMA analysis.

Jennifer: It surprising how subjective the definition of surface analysis features actually is. Encourage Lindsay to tweak the object definition criteria in whatever way necessary to capture the features of interest.

Adam: Emphasized how difficult identifying drylines in an automated way really is.

John HG: Would be nice to make the derivation of a gradient on the fly as a function very similar to the existing derive(x) functionality. Consider supporting: convert(x) = xgrad(dx); and convert(x) = ygrad(dy); Where dx and dy define the size of the 1-d gradient to be computed.

John HG: We should think more about double-threshold merging. Recommend supporting a merge threshold for each input field along with merging super object field logic. So consider adding:

multivar_merge_logic =  "#1 && #2 && #3";

Where 1, 2, and 3 correspond to the merge objects from each field.

John HG: We should enhance MODE to support filtering cluster objects AFTER applying double-threshold and engine merging logic. Right now, object filters are only applied to simple objects. Adam confirms that it would be preferable to add the length filter to the cluster of objects. So consider adding:

cluster_attr_name   = [];
cluster_attr_thresh = [];

Also consider renaming "filter_attr_name/filter_attr_thresh" to "simple_attr_name/simple_attr_thresh".

lindsayrblank commented 3 years ago
JohnHalleyGotway commented 2 years ago

Reopening this issue.

Merging in the multi-variate MODE PR into develop triggered a testing workflow run for METplus. But 6 of the 44 testing groups failed with runtime errors from MODE. So while the changes didn't break the MET regression tests, they did break the METplus ones.

All of the MODE runtime errors look something like this:

ERROR  : 
ERROR  : replace_env() -> unable to get value for environment variable "7??W?f?ʳ࿑5??ȕZ̈??ȗ&<y2?Ht???!̛*%Z4e?j"
ERROR  : 

Here's why... the logic on this line is insufficient:

const char * const user_config_filename = argv[3];

It assumes that the user-specified config file is always the 4-th item on the command line.

However, METplus runs MODE like this (at least in all of the failed runs):

bin/mode -v 2 /d1/projects/METplus/METplus_Data/met_test/data/sample_fcst/2005080700/wrfprs_ruc13_12.tm00_G212 /d1/projects/METplus/METplus_Data/met_test/data/sample_fcst/2005080712/wrfprs_ruc13_00.tm00_G212 MODEConfig_wrapped -outdir out

Having -v 2 immediately after the executable shifts the config file down. The weird error message is the result of trying to parse a binary GRIB file as if it were an ASCII config file. Rerunning the failed MODE command line with all optional args following the required ones fixes this problem.

However the real fix it to use the CommandLine class the properly parse the command line.