Open DanielAdriaansen opened 1 year ago
os.path.abspath can be used to get the full path from a relative path. This could be used to add the directory containing the python embedding script if the user does not specify the full path. See: https://github.com/dtcenter/METplus/blob/aa9d7b3451bad720dd8a9928a735b3bf12318585/ush/run_metplus.py#L24
A potential fix (untested) would be to change this line from:
pyembed_module_name = sys.argv[2]
to:
pyembed_module_name = os.path.abspath(sys.argv[2])
Describe the Enhancement
This is a general issue with no due date to document suggested changes and/or improvements to how Python Embedding works within MET, with the primary goal of improving the user experience and ability to communicate and teach Python Embedding to users.
Time Estimate
TBD
Sub-Issues
Consider breaking the enhancement down into sub-issues.
[ ] Revisit the Python embedding strings
PYTHON_NUMPY
andPYTHON_XARRAY
. I think we can generalize these more to be something likePYTHON_DATAPLANE
orPYTHON_GRID
andPYTHON_POINT
. Then, forPYTHON_DATAPLANE
(or_GRID
), have MET deal with deciphering whether it is an Xarray object or a NumPy N-D array. The point data is easier, and really only comes in one way. The largest confusion in my opinion is thatPYTHON_NUMPY
is used for point data currently, which has nothing to do with NUMPY whatsoever. Alternatively, we could addPYTHON_POINT
and maintain the two dataplane instances the way they are.[x] If --enable_python is used when building, have make_test check if the version of python specified can successfully load all the required packages from a list of required packages that is maintained somewhere. (See Enhance make_test to test Python embedding functionality #2788)
[ ] Add to Appendix F or the contributors guide much more detailed information about running with and without MET_PYTHON_EXE set. Include wire diagrams to illustrate. Also document the data structures that are populated in the 3 different types of python embedding (grid, point, pairs).
[ ] This section in the METplus Wrappers docs: https://metplus.readthedocs.io/en/latest/Users_Guide/overview.html#metplus-components-python-requirements, is confusing because it seems that "cfgrib" is a required Python package for METplus Wrappers, but then later in chapter 4 it does not list "cfgrib", and says "Minimum Requirements To run most of the METplus wrappers, the following packages are required: dateutil (2.8)". I think there could be some documentation work in the METplus Wrappers also.
[ ] When attempting to generalize where Python Embedding should be put for METplus Wrappers configuration, I was able to determine the following. For gridded data, the _INPUT_TEMPLATE conf item is set to just
PYTHON_NUMPY
orPYTHON_XARRAY
, and then the Python script and any arguments is included in the _VAR\<n>_NAME conf item. However, for point data, the _INPUT_TEMPLATE conf item contains both thePYTHON_NUMPY
string, and another "=" followed by the Python script and any script arguments. It would be nice if these worked similarly, where for point data we could set the _INPUT_TEMPLATE to justPYTHON_NUMPY
, and then use the *_VAR\<n>_NAME conf items for the Python script and any arguments to it, which would make configuring METplus wrappers for point and gridded data similar with respect to where elements of Python Embedding should be put in the configuration file. After talking to John HG, it sounds like it might not be feasible to change the way MET works -or- to change the wrappers to include both items in a single conf for gridded data like obs data due to how the MET tools can be called for gridded data. Some of our discussion:[ ] How can we expose the value of
MET_PYTHON_BIN_EXE
to the user? I was able to run this command:seneca:~$ /d1/projects/MET/MET_regression/develop/NB20230123/MET-develop/bin/plot_data_plane PYTHON_NUMPY fcst.ps 'name="test.py";' -v 12
which showed the value. However, it might be nice to make a symbolic link in /usr/local/met-11.0.0/bin to the python3.8 exe that was used, OR have a utility script in /usr/local/met-11.0.0/bin the user can run to echo the value of the Python install for them. This way the user can directly instantiate the same Python MET will use, and also verify whether the Python packages they need are available there or whether they will need to handle installing their ownMET_PYTHON_EXE
to use.[ ] Add more documentation or examples of how to use
point_stat
with Python Embedding. David Fillmore was using Python Embedding for both fcst and obs data, with the fcst being gridded data and the obs being point data. This led to constructing a command for MET and configuring METplus for both "point" data and "gridded" data, which work differently. Maybe more documentation, but maybe this is another data point for changing how it works for gridded/point to be more similar?[ ] There are some Python scripts located on this DTC webpage: https://dtcenter.org/community-code/model-evaluation-tools-met/sample-analysis-scripts. Is this the best place for them? Or should this be included in the Appendix F, and linked to from there? From John HG:
[ ] Should we document what other elements are allowed in the
plot_data_plane
field_string argument? name = "TMP"; level = "P500"; convert(x) = K_to_C(x); censor_thresh = lt0; censor_val = -9999; set_attr_name = "Temperature"; set_attr_level = "500mb"; set_attr_valid = 20210708_12;. See here for the best resource: https://met.readthedocs.io/en/latest/Users_Guide/config_options.html#fcst.[ ] Added 2/10/23: On this line: https://github.com/dtcenter/MET/blob/01234646a66dc92877869f427a4490c106ae80d0/src/libcode/vx_data2d_python/python_dataplane.cc#L323, the system is calling Python but it may not be using the environment that the user has set. For example, PYTHONPATH could be set but when that call is executed it might not know about PYTHONPATH.
[ ] Added while testing #2428 on 2/17/23: We should consider making the Python requirements for METplus the same as the Python requirements for building MET with support for Python embedding. For using MET with Python embedding, it appears based on Appendix F the modules
sys, os, argparse, importlib, numpy, and netCDF4
are required. Some of those are stdlib, but numpy (and maybe netCDF4) are not. (see #2490)[ ] Other open issues: #1470, #2539
[ ] Can we re-order the debugging of the Initializing/using/running for the Python Embedding messages?
[ ] It seems like PYTHON_XARRAY is not needed? When I have an Xarray DataArray object in memory named "met_data", and use PYTHON_NUMPY, it works just fine. Why is this?
[ ] Data ordered intuitively inside a hypercube read in by Python need to be flipped before sending to MET because MET populates the 2D dataplane from upper left to lower left rather than lower left first. This seems dangerous since it is not intuitive.
[ ] Should we even have many examples of using MET with Python Embedding directly? Like on the DTC webpage with sample scripts and data, those are using MET directly but don't we want folks to use METplus wrappers?
[ ] Enhance MET Python embedding to report an error if
MET_PYTHON_EXE
is set but does not exist, and warn that it is using the compile time instance.[ ] Decide how to support missing data values in MET for cases of compile time Python and MET_PYTHON_EXE. Met tools expect a missing data value/FillValue of -9999.0. Any other value will be treated as valid data. See here: #2525 for more info.
[ ] The MPR Python Embedding example assumes users are reading an MPR file generated by another MET tool, and uses Pandas
read_csv()
to read it. It specifies the type of data asdtype='str'
, thus all columns of the MPR file are cast as strings prior to sending to MET. If a user curates their own MPR data in Python (for example, massaging JSON data into the MPR line type format), they may have columns that are not string but rightfully numeric in their type (e.g. QC, FCST, OBS). Should we allow mixed data types in this case? At a minimum, perhaps we return a more informative error message thanbad object type
.[ ] Added 5/8/2024 during #2877 review: add documentation of the MET python module as needed. This is mostly intended for internal use though, so it is debatable how useful documentation would be for users. It might be more for developers.
Completed Items
MET_PYTHON_BIN_EXE
work. @georgemccabe is there an issue for that somewhere? This can be found in the list of environment variables set at MET compile time, and is also printed to the user as a debug statement when using Python embedding. The issue where this was added was #2388, and the debug statements were added later.met_point_obs.py
at runtime when needed rather than expecting the user to figure out if/when it actually needs to be called. Changed in #2509convert_point_obs
insidemet_point_obs
. It seems that if a user is NOT settingMET_PYTHON_EXE
, then MET will only find this after https://github.com/dtcenter/MET/commit/5ad920b3fab2ddf1c5fe5aa3a73e014addab9a74, but if before that commit then they need to setPYTHONPATH
. Similarly, if they are using Python embedding, they will have to setPYTHONPATH
prior to #2428 being merged. Is this right? Changed in #2509. The solution here is just to encourage users to update to a version after this PR, which removes the need forconvert_point_obs
.met_point_obs.py
into some other directory to automatically be included in the user's python path at runtime. But we don't necessarily want to include EVERYTHING that lives in the scripts/python directory because that MIGHT cause unexpected behavior depending on the names of scripts the user chooses. Inventory the existing location of python scripts and data files in the MET repo and revise, as needed. Changed in #2509. Since a user typically won't need this anymore (i.e. to callconvert_point_data
), then this probably won't be an issue but I also think the Python reorg in #2509 also makes it such that all the Python code a user would need/want to use in their Python embedding script will be found by MET at runtime. If running standalone the user may need to add some of these scripts/python/met directories to their PYTHONPATH.Relevant Deadlines
NONE
Funding Source
NONE
Assignee
Labels
Projects and Milestone
Define Related Issue(s)
Consider the impact to the other METplus components.
Enhancement Checklist
See the METplus Workflow for details.
feature_<Issue Number>_<Description>
feature <Issue Number> <Description>