JiaweiZhuang / xESMF

Universal Regridder for Geospatial Data
http://xesmf.readthedocs.io/
MIT License
269 stars 49 forks source link

Add support for custom coordinate names #94

Closed huard closed 3 years ago

huard commented 4 years ago

As proposed in #74, this PR uses a dictionary of variable names to indicate which variables hold latitude and longitude coordinates and bounds. If this dictionary is not given, the code uses very basic heuristics based on the CF-Convention to find the coordinates and their bounds. The bounds are then converted into the format required by ESMF.

Note that the conversion from vertices (nx, ny, 4) is probably wrong and needs additional work and testing.

jthielen commented 4 years ago

Glad to see this moving forward before a full cf-xarray package is ready to be integrated!

I have one early suggestion: instead of having the basic CF-based check being against the long_name attribute (which is not standardized in the current conventions), have it check for the required units attribute

codecov-io commented 4 years ago

Codecov Report

Merging #94 into master will decrease coverage by 0.71%. The diff coverage is 93.42%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #94      +/-   ##
==========================================
- Coverage   96.55%   95.83%   -0.72%     
==========================================
  Files           6        6              
  Lines         319      384      +65     
==========================================
+ Hits          308      368      +60     
- Misses         11       16       +5     
Impacted Files Coverage Δ
xesmf/util.py 94.18% <92.53%> (-5.82%) :arrow_down:
xesmf/frontend.py 95.55% <100.00%> (-0.03%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 2d2e365...d9b1062. Read the comment docs.

huard commented 4 years ago

Done. Just wondering whether this is a one-to-one relationship. Lats and lons have to have those units, but is any variable with those units guaranteed to be --the-- grid coordinate ?

huard commented 4 years ago

Another question: there is a potential for confusion between two types of "bounds":

Suggestions about the terminology to use to make this distinction clear ? The code currently uses bounds and boundaries interchangeably.

jthielen commented 4 years ago

Just wondering whether this is a one-to-one relationship. Lats and lons have to have those units, but is any variable with those units guaranteed to be --the-- grid coordinate ?

From my reading of the conventions document, while it is not explicitly stated to be one-to-one/uniquely identifying, it is implied since this is how the conventions instruct applications to interpret those coordinate types (and such an instruction wouldn't be very useful if it wasn't uniquely identifying). For example:

Coordinates of latitude with respect to a rotated pole should be given units of degrees, not degrees_north or equivalents, because applications which use the units to identify axes would have no means of distinguishing such an axis from real latitude, and might draw incorrect coastlines, for instance.

codecov-commenter commented 4 years ago

Codecov Report

Merging #94 into master will increase coverage by 0.59%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #94      +/-   ##
==========================================
+ Coverage   96.55%   97.15%   +0.59%     
==========================================
  Files           6        6              
  Lines         319      386      +67     
==========================================
+ Hits          308      375      +67     
  Misses         11       11              
Impacted Files Coverage Δ
xesmf/frontend.py 95.55% <100.00%> (-0.03%) :arrow_down:
xesmf/util.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 2d2e365...6b67e6c. Read the comment docs.

huard commented 4 years ago

Thanks @stefraynaud for the review. The one thing that still bothers me with the PR is the naming convention. There are various terms used that I think we should use more consistently:

I don't have any actual suggestion to make to clear this up.

stefraynaud commented 4 years ago

As for the terms, maybe it can be clarified in two generic steps:

  1. Make sure that the docstrings mentions the shape of arrays.
  2. Clarify the user guide about the possible terms.
huard commented 4 years ago

The code to convert nxmx4 bounds to 2D grid corners should make respect this convention. I don't think it does at the moment: https://docs.google.com/document/d/1ueYsYrPob8Gd3HxQVHC0vKfpj_Jjh0iO4Pe70_QLpDk/edit

rabernat commented 4 years ago

If this PR is ready to merge, perhaps it can be moved to https://github.com/pangeo-data/xESMF? Unfortunately this has to be done manually.

jthielen commented 4 years ago

Also, given that cf-xarray has been released, would it be worth just switching over to depending on that at this point?

rabernat commented 4 years ago

We discussed that at the last dev meeting (which you're welcome to join any time; see #98). @huard concluded it might not be worth it in this specific case.

jthielen commented 4 years ago

We discussed that at the last dev meeting (which you're welcome to join any time; see #98). @huard concluded it might not be worth it in this specific case.

Sounds good! That's definitely reasonable for it to not be worth it in this case, and I'm glad to hear it was discussed. I'll be using xESMF and cf-xarray in combination for an upcoming project or two, so if there are any sticking points, I'll be sure to put in a PR to help.

dcherian commented 4 years ago

it might not be worth it in this specific case.

So this PR was a main reason I wrote it up, so that logic to figure lat/lon didn't have to be written 5-10x over.

The logic is pretty self contained so cf-xarray's utility functions should be easily copy/pastable.

rabernat commented 4 years ago

FWIW, I also favor outsourcing this to cf-xarray. However, I am not the author of the PR and so must defer to his judgement. Perhaps we could merge this as is, with good test coverage, and then refactor to use cf-xarray in a future PR without touching the tests?

stefraynaud commented 4 years ago

Historically, this PR came after PR #73. This latter was proposed to make the job of cf-xarray, and it has been chosen to keep only #94 because it simpler implementation. Now that cf-xarray is here, a new PR must be proposed to merge to two capabilities:

This means: if the names are not specified, then search for them according to the default names, and if not found, use cf-xarray,

huard commented 4 years ago

One concern was weighing the maintenance costs of adding a dependency versus maintaining code. I thought the code was so basic, it was not worth an additional dependency. As xesmf and cf-xarray both evolve, I suspect the tradeoff will lean in favor of relying on cf-xarray.

I think this PR still has an issue with the conversion from 2D vertices to grid corners however, and I'm short on time to investigate this. Is this in-scope for cf-xarray ?

dcherian commented 4 years ago

One concern was weighing the maintenance costs of adding a dependency versus maintaining code. I thought the code was so basic, it was not worth an additional dependency. As xesmf and cf-xarray both evolve, I suspect the tradeoff will lean in favor of relying on cf-xarray.

another approach is to make the code in cf-xarray easily copy-pastable which it may be already.

I think this PR still has an issue with the conversion from 2D vertices to grid corners however, and I'm short on time to investigate this. Is this in-scope for cf-xarray ?

Sure. I added a function to guess 1D bounds (edges) already: https://cf-xarray.readthedocs.io/en/latest/generated/xarray.Dataset.cf.add_bounds.html#xarray.Dataset.cf.add_bounds. We can add a 2D version too.

huard commented 4 years ago

Great ! If xesmf can rely on cf-xarray for the variable identification and corner grid calculations, I think it would make sense to ditch this PR in favor of a new one using cf-xarray as a dependency.