Closed georgemccabe closed 5 months ago
I was able to test this on seneca. It takes 20s to read in this file:
DEBUG 1: Reading point observation file: PYTHON_NUMPY=pyembed_pandas_testing.py
typ sid vld lat lon elv var lvl hgt qc obs
0 ADPUPA 89571 20200824_113000 -68.580002 77.970001 18.0 HGT 1000.0 -151.19162 2 -151.000000
1 ADPUPA 89571 20200824_113000 -68.580002 77.970001 18.0 SPFH 977.0 18.02284 2 0.000329
2 ADPUPA 89571 20200824_113000 -68.580002 77.970001 18.0 TMP 977.0 18.02284 2 249.449997
3 ADPUPA 89571 20200824_113000 -68.580002 77.970001 18.0 HGT 977.0 18.02284 2 18.000000
4 ADPUPA 89571 20200824_113000 -68.580002 77.970001 18.0 SPFH 976.0 -9999.00000 2 0.000342
... ... ... ... ... ... ... ... ... ... .. ...
934843 SYNDAT MA030044 20200824_120000 26.250000 127.500000 -- VGRD 500.0 -9999.00000 0 6.900000
934844 SYNDAT MA030044 20200824_120000 26.250000 127.500000 -- UGRD 400.0 -9999.00000 0 7.900000
934845 SYNDAT MA030044 20200824_120000 26.250000 127.500000 -- VGRD 400.0 -9999.00000 0 6.100000
934846 SYNDAT MA030044 20200824_120000 26.250000 127.500000 -- UGRD 300.0 -9999.00000 0 4.700000
934847 SYNDAT MA030044 20200824_120000 26.250000 127.500000 -- VGRD 300.0 -9999.00000 0 4.500000
[934848 rows x 11 columns]
which is close to 1M observations. Better running of PB2NC
via config options could help speed this up.
I worked with the DataFrame a bit in Python and didn't observe any trouble.
I wonder if we need any documentation of this? Maybe in Appendix F? @JohnHalleyGotway thoughts?
I guess we have this section, which is empty: https://met.readthedocs.io/en/develop/Users_Guide/appendixF.html#met-python-package
I thought maybe convert_point_data()
was documented there, but it is not. So maybe for now it's OK to leave this undocumented.
Maybe I will add a "to-do" item here: on #2414 to document the "MET Python Module".
@hsoh-u , I talked with @DanielAdriaansen about these changes. Based on his feedback, I added an init function to nc_point_obs to take an input file path so it can be initialized without calling read_data(). I also changed the read_data() function to raise an exception instead of return a boolean for success.
Expected Differences
[X] Do these changes introduce new tools, command line arguments, or configuration file options? [No]
[X] Do these changes modify the structure of existing or add new output data types (e.g. statistic line types or NetCDF variables)? [No]
Pull Request Testing
On seneca, I created a test script that runs plot_point_obs both passing an input file directly into MET and passing that same file through a python embedding script that converts the data to a Pandas DataFrame, then passes it to MET.
Test directory:
/d1/projects/METplus/METplus_Data/development/met_2781
To run the script:
or just compare the output files
output/raw_subset.png
andoutput/pyembed_subset.png
which should contain the same plot.I also added a unit test to demonstrate the new Python Embedding example and confirmed that it runs successfully both on seneca and in GHA.
@DanielAdriaansen : confirm that the new logic works with your test data @hsoh-u : confirm that the new python logic is in the correct location and matches the format/standards of the rest of the python logic
Could consider adding an example of using the new script somewhere, but I am not sure where that would live.
[X] Do these changes include sufficient testing updates? [Yes]
[X] Will this PR result in changes to the MET test suite? [Yes]
There will be 1 additional output file generated:
[X] Will this PR result in changes to existing METplus Use Cases? [No] If yes, create a new Update Truth METplus issue to describe them.
[X] Do these changes introduce new SonarQube findings? [No]
[X] Please complete this pull request review by 5/7/2024.
Pull Request Checklist
See the METplus Workflow for details.