dtcenter / MET

Model Evaluation Tools
https://dtcenter.org/community-code/model-evaluation-tools-met
Apache License 2.0
78 stars 24 forks source link

Restructure the python embedding logic to run the user-specified instance of python to write/read a temporary pickle file. #1205

Closed JohnHalleyGotway closed 4 years ago

JohnHalleyGotway commented 5 years ago

Python embedding in MET does not work well on Cheyenne with the h5py or pygrib modules.

Not sure if we can actually fix this in met-8.1.2 or not. You could change the milestone to met-9.0 instead.

This issue has been found for two modules, specifically h5py and pygrib. You can see these issues by running the following commands. The problem may be that MET was compiled using the HDF5 and GRIB2 libraries, which are also used by these packages.

module use /glade/p/ral/jntp/MET/MET_releases/modulefiles module load met/8.1_python ncar_pylib

This runs fine with pygrib:

cd /glade/p/ral/jntp/MET/MET_Help/mandelbaum_data_20190930/pygrib_problem python ./read_GFSv3.py ./gfs.t00z.pgrb2.1p00.f048.reduced.grib2

This core dumps:

plot_data_plane PYTHON_NUMPY gfs.ps 'name="./read_GFSv3.py ./gfs.t00z.pgrb2.1p00.f048.reduced.grib2";'

This runs fine with h5py:

cd /glade/p/ral/jntp/MET/MET_Help/mandelbaum_data_20190930/h5py_problem python read_IMERG_V06_HDF5.py 3B-HHR.MS.MRG.3IMERG.20180102-S200000-E202959.1200.V06B.HDF5 HQprecipitation

This core dumps:

plot_data_plane PYTHON_NUMPY imerg.ps 'name="read_IMERG_V06_HDF5.py 3B-HHR.MS.MRG.3IMERG.20180102-S200000-E202959.1200.V06B.HDF5 HQprecipitation";'

georgemccabe commented 5 years ago

Regarding the pygrib error:

It appears that calling .data() or .values on a grb record via pygrib causes a seg fault via plot_data_plane but does not occur when calling directly from python on cheyenne. I have found references of other people experiencing this problem when running from python (copied below). It sounds like there may be an error in a library (possibly Jasper?) that is being used external to pygrib since it works fine via python but not in MET.

from https://github.com/jswhit/pygrib/issues/86: "We encountered this same problem, albeit, this was not an issue with pygrib itself, but with ECCODES and/or a library on our system. The issue can be recreated with any hrdps file from Environment Canada on an intel-python3 docker image, with pygrib and eccodes installed via conda.

We discovered that while trying to read from the grib file, the stack filled up for the process and resulted in a segfault. Increasing the stack size ulimit solved our problem. We do NOT experience this issue with any other grib data, even data from the gem or rgem, it seems unique. Hopefully this helps!"

from https://github.com/jswhit/pygrib/issues/74 the problem may be caused by Jasper. I have encountered the same (and for reading meteofrance files) and resolve it by using (from sources) : jasper 2.0.14 (installing in /usr/local/jasper_new and building as static) eccodes-2.7.3 (don't forget to ask for python interface and to give path to jasper lib and includes). This message is old but hope this helps.

georgemccabe commented 5 years ago

h5py causes a seg fault when you import it on cheyenne. This is due to a mismatch of the version of HDF5 used to compile MET and the version to install the h5py python module.

I was able to rebuild h5py on my machine using the same version of HDF5 that I used to install MET and was able to get script to run through plot_data_plane. You have to build h5py from source by running:

pip uninstall h5py pip install --no-binary=h5py h5py

I had to do a little trickery to get this to work using the correct version of HDF5. Here are some of the things I did:

To find -lhdf5 and -lhdf5_hl, I created a sym link for libhdf5.so and libhdf5_hl.so in a directory that gcc could find (there may be a better way to add -L

to the call but this is how I got it to work), i.e. /home/mccabe/miniconda3/envs/py2.7/lib.

Also, I had to modify the read_IMERG_V06_HDF5.py script to convert the numpy float32 value to a python float value. I changed min(lat) to min(lat).item() and the same for lon.

JohnHalleyGotway commented 4 years ago

This logic has been merged into the develop branch. As of 2/4/2020, python embedding for point and gridded data works both with and without the pickle logic. Also, LD_LIBRARY_PATH and PYTHONPATH do NOT need to be set.

On 2/4/2020, added the following refinements: (1) Update ascii2nc python embedding to write pickle file to the MET temp directory instead of the current working directory. (2) Update point and gridded python embedding to DELETE the temporary pickle files. (3) Rename the temporary pickle files as "tmp_met_pickle..." and "tmp_ascii2nc_pickle..." for gridded and point data, respectively, where "..." is the process id suffix that the make_temp_file_name() function adds. (4) A couple minor changes to remove stale code and cout's from C++ code. (5) Add consistent Debug(3) log messages to MET about running python scripts and reading pickle files. (6) Update python scripts to print the name of the script being run and any temp files being written. (7) Updated read_ascii_point.py python script to format the columns of data correctly. (8) Added 3 unit tests to unit_python.xml to call plot_data_plane via pickle and call ascii2nc with and without pickle logic.

Still would like to reimplement split_path() as ConcatString::dirname() and ConcatString::basename() to get rid of PATH_MAX in several spots.

I also realized that the obs_gc() variable in the ascii2nc output contains bad data values. Need to do more debugging!

JohnHalleyGotway commented 4 years ago

On 2/5/2020, updated ascii2nc point python embedding logic to...