JiaweiZhuang / xESMF

Universal Regridder for Geospatial Data
http://xesmf.readthedocs.io/
MIT License
269 stars 49 forks source link

Add some CF support for lon and lat #73

Closed stefraynaud closed 4 years ago

stefraynaud commented 4 years ago

This PR adds some support for CF conventions to search for longitude and latitude in Datasets, and rename them to 'lon' and 'lat'. Bounds data arrays are also renamed if found.

A module named 'cf' was created for that and unitests were added.

codecov-io commented 4 years ago

Codecov Report

Merging #73 into master will increase coverage by 0.44%. The diff coverage is 97.61%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master     #73      +/-   ##
=========================================
+ Coverage   95.76%   96.2%   +0.44%     
=========================================
  Files           6       7       +1     
  Lines         260     343      +83     
=========================================
+ Hits          249     330      +81     
- Misses         11      13       +2
Impacted Files Coverage Δ
xesmf/frontend.py 93.75% <91.66%> (-0.24%) :arrow_down:
xesmf/cf.py 98.61% <98.61%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 7479110...9094213. Read the comment docs.

stefraynaud commented 4 years ago

With the first commit, coordinates and bounds are automatically identified and renamed, if they respect CF conventions which are based on name, units and standard_name.

With the second commit, output coordinate names are automatically restored.

JiaweiZhuang commented 4 years ago

@stefraynaud Thanks for the PR! I had a long backlog and didn't get time to look at this -- sorry for the delay.

I agree that some degree of decoding feature is useful. On the other hand I am also worried about putting complex logic into xesmf, especially tricky edges cases like: which variable actually got used if the dataset contains all three variables 'lat_b', 'lat_bnds' 'latitude_bnds'.

To that end I summarized a meta-issue at #74. Would you be able to build the CF-convention support on top of the proposed xesmf.config.set(grid_name_dict=...)? That would be much easier to maintain and extend.

Also, will xarray.decode_cf be useful here?

JiaweiZhuang commented 4 years ago

Also note that renaming the coordinate name alone does not solve CF-convention -- CF uses a boundary shape of (n_lat, n_lon, 4) while ESMPy expects (n_lat+1, n_lon+1) https://github.com/JiaweiZhuang/xESMF/issues/14#issuecomment-369686779.

stefraynaud commented 4 years ago

Also note that renaming the coordinate name alone does not solve CF-convention -- CF uses a boundary shape of (n_lat, n_lon, 4) while ESMPy expects (n_lat+1, n_lon+1) #14 (comment).

You are right, but this is beyond the scope of this PR which is focused on finding variables in datasets.

By the way, a conversion from the (n_lat, n_lon, 4) form to the (n_lat+1, n_lon+1) it straigthforward, but must raise a warning (my two cents).