NCAS-CMS / cf-python

A CF-compliant Earth Science data analysis library
http://ncas-cms.github.io/cf-python
MIT License
120 stars 19 forks source link

Conservative regridding from rotated pole to OSGB grid #196

Open clairbarnes opened 3 years ago

clairbarnes commented 3 years ago

Hi,

I'm trying to carry out conservative regridding of a netcdf from the EuroCordex rotated-pole grid onto a regular OSGB grid (although at this point, I'd settle for a regular lat-lon grid). Linear interpolation works fine, but when I try to carry out conservative regridding, I get an error message: Source <CF AuxiliaryCoordinate: longitude(412, 424) degrees_east> coordinates must have contiguous, non-overlapping bounds for conservative regridding.

The auxiliary coordinates are 2d arrays of latitude & longitude, while the dimension coordinates are 1d vectors of rotated latitude and rotated longitude. The auxiliary coordinates do have bounds in the original .nc file - I've checked using R and each bounded quadrilateral shares two vertices with each neighbour, suggesting that they define contiguous, non-overlapping regions. I've tried setting and resetting the bounds of the main dimension coordinates in both the source and grid files, but haven't been able to reset the bounds of the auxiliary coordinates that seem to be causing the problem. Can you advise the best way to resolve this issue to allow conservative regridding of rotated-pole data onto a regular OSGB grid?

(details of source data here, details of target grid here)

Thanks, Clair

sadielbartholomew commented 3 years ago

Hi Clair, thanks for the question and for providing enough detail for us to go about investigating so we can provide an answer without needing to ask any follow-up questions (at least in theory!).

I'll start investigating and we will get back to you early next week at the latest.

clairbarnes commented 3 years ago

I've been playing around a bit more with this and I have a couple of thoughts that will hopefully save you a bit of investigation time. When I inspect the file in R (my native language), there are two fields vertices_longitude and vertices_latitude defining contiguous bounds in the source file, but these aren't being picked up as the auxiliary coordinates when I read the file using cf.read.

My first thought was that, if the correct variables were being used as auxiliary coordinates, the problem would be solved. However, I suspect that it won't be that simple, because the source file's coordinate reference system is given as rotated lat-lon, while those non-overlapping contiguous vertices are in regular lat-lon. So, even if the contiguous vertices were loaded, I think the regridding would probably do something rather odd (if mathematically correct!)

All this makes me think that it should be quite straightforward to map the contiguous vertices onto the rotated lat-lon grid - and if the resulting array of rotated vertices can easily be used to set the auxiliary coordinates, then the problem might be solved. However, I don't know if it's really that simple, and it's not clear to me how to set the auxiliary coordinate (probably because I only started using Python about two weeks ago), so any suggestions on the best way to proceed would be much appreciated!

sadielbartholomew commented 3 years ago

Thanks for the further information @ClairBee. I suspect I know what will work, but ideally I'd like to test it myself before getting back to you in case it is not the right approach. You've provided a details of the field in question in a cf-python like representation, but I can't use that to recreate the field without manually constructing each component of the field (as far as I am aware).

Would you mind also sharing the ncdump -h output of the original source and target netCDF files (i.e. some files after ncdump -h filename.nc > filename.cdl)? That way I can read the fields into cf-python as CDL with one line of code and test out my theory? Alternatively if you run and share the output of print(f.creation_commands()) on each read-in field f, I can copy and paste the corresponding lines and run them with cf-python to recreate those fields quickly.

However it sounds like you are well-versed on what might work already, and if you are happy to continue to investigate yourself in the meantime or otherwise, you could try the following out which is what I suspect will work:

My first thought was that, if the correct variables were being used as auxiliary coordinates, the problem would be solved. However, I suspect that it won't be that simple, because the source file's coordinate reference system is given as rotated lat-lon, while those non-overlapping contiguous vertices are in regular lat-lon. So, even if the contiguous vertices were loaded, I think the regridding would probably do something rather odd (if mathematically correct!)

Your source field has various Coordinate conversion components which can convert between the different systems (see our dedicated docs page here for more info. if useful). So cf-python is aware there are different systems in use and should be able to convert between them appropriately. So that part should not be an issue.

and it's not clear to me how to set the auxiliary coordinate (probably because I only started using Python about two weeks ago), so any suggestions on the best way to proceed would be much appreciated!

For now, this might help, though: you can use get_construct and set_construct to get and set coordinates, for example:

  1. run f.constructs() or f.coordinates() to find the coordinates of relevance;
  2. take one out via a = f.get_construct('<key name>');
  3. re-set it as an auxiliary coordinate via f.set_construct(cf.AuxiliaryCoordinate(source=a, copy=True|False)).

You can also set the bounds on the auxiliary coordinates separately (I'm not sure if you are implying you might want to try that) though I won't detail it all here.

Hope that might help for now, but please do send over the ncdump -h or creation_command outputs so we can work out the best approach for you if that is preferred!

clairbarnes commented 3 years ago

Hi @sadielbartholomew,

Thanks for the above. I've had a look at the constructs and coordinates and none of them seem to be picking up the variables vertices_latitude and vertices_longitude that I can see when I read the file in R. Moreover, it looks like the auxiliary coordinates do contain bounds of the correct dimension - but those bounds aren't the same as the arrays stored in vertices_latitude and vertices_longitude, and they don't seem to define contiguous non-overlapping regions.

At the moment I can see three possible solutions:

  1. somehow read the vertices_latitude and vertices_longitude and use those as the auxiliary coordinates
  2. create the vertices based on the existing auxiliary coordinates and use those as the auxiliary coordinates
  3. store the vertices in a separate object and import for use as the auxiliary coordinates

Given my unfamiliarity with cf-python and general lack of python experience, I think my best bet is the third option. However, it would be great to get a solution that uses the information that's actually in the source file and doesn't rely on any external input! I've attached the CDL output you requested - hopefully you'll be able to use it figure out how to import those missing bounds.

sadielbartholomew commented 3 years ago

Thanks again Clair for your thoughts and for providing the CDL output. We'll get back to you shortly, hopefully later today, regarding potential solution(s) and details.

sadielbartholomew commented 3 years ago

Hi Clair, just to say I'm sorry we've not got back to you yet, we're been really busy preparing to do a set of releases before Easter. I'll do my best to reply tomorrow. Thanks for your patience.

sadielbartholomew commented 3 years ago

Hi again @ClairBee and sorry for the delay in replying, which was due to various priorities to complete before Easter. This evening I have read through the thread here in detail and investigated using the CDL input you have kindly provided.

First, to address a point you have made:

there are two fields vertices_longitude and vertices_latitude defining contiguous bounds in the source file, but these aren't being picked up as the auxiliary coordinates when I read the file using cf.read.

At least with the latest version of cf-python as I am using, 3.8.0, these seem to be picked up and registered as bounds on the auxiliary coordinates as they should be. As a minimal example, if I read-in data_in.cdl and then inspect the netCDF variable containing the bounds of the auxiliary coordinates, I see that lat_vertices and lon_vertices are the origins:

>>> import cf
>>> f = cf.read('data_in.cdl', fmt='CDL')[0]
>>> lat = f.auxiliary_coordinate('auxiliarycoordinate0')
>>> lon = f.auxiliary_coordinate('auxiliarycoordinate1')
>>> print("LAT IS", lat, "LON IS", lon)
LAT IS latitude(412, 424) degrees_north LON IS longitude(412, 424) degrees_east
>>> lat_bounds = lat.bounds
>>> lon_bounds = lon.bounds
>>> print("LAT BOUNDS ARE", lat_bounds, "LON BOUNDS ARE", lon_bounds)
LAT BOUNDS ARE latitude(412, 424, 4) degrees_north LON BOUNDS ARE longitude(412, 424, 4) degrees_east
>>> lat_bounds.nc_get_variable()
'lat_vertices'
>>> lon_bounds.nc_get_variable()
'lon_vertices'

I should have asked previously (my apologies for omitting it) if you could confirm the versions of cf and its dependencies that you are running with by sharing the output after running cf.environment() in the environment which you tried the regridding case in question. Would you mind sharing that? Just to help us understand the context of the error you are seeing.

Next, unfortunately I can't recreate the immediate error you are getting, even on earlier versions of cf-python than the latest, 3.8.0. What I do observe across cf versions is errors regarding bounds not being found in the destination field, which after inspecting the CDL in question seem perfectly legitimate. I don't run into an error about the source having bounds not recognised, though, and any issues with the source would be raised as errors first, before any issues with the destination grid.

To demonstrate this, I tried a pair of trivial test regridding operations from the desired input data to the input data grid (in the example code below, from f onto f i.e. f.regrids(f)) and then from the desired output grid's data to the output data grid (below from g to g i.e. g.regrids(g)). Input to input works fine, but with output to output I get the error "ValueError: Source <CF AuxiliaryCoordinate: grid_longitude(112, 82) degrees_east> coordinates must have bounds for conservative regridding." reflecting that the bounds are missing in your desired output, seemingly consistent with the CDL provided where there are bounds on the dimension coordinates (in 'm' units which can't trivially be mapped onto the 'degrees north' and 'degrees east' of the input f bounds) but none on the auxiliary coordinates.

This script copied below gives the behaviour described above in my environment. Could you please share the output you see for this script, also, so I can check we are on the same page here about the end goal and approach?

import cf

# 1. Read in fields
f = cf.read('data_in.cdl', fmt='CDL')[0]
g = cf.read('grid_out.cdl', fmt='CDL')[0]

# 2. Print all detail about each field
print("*Field f, rotated-pole data*")
f.dump()
print("*Field g, regular OSGB grid*")
g.dump()
y
f_onto_itself = f.regrids(f, method='conservative')  # this seems to work
g_onto_itself = g.regrids(g, method='conservative')  # but this raises the error above

So overall I am not sure how the error you are reporting is emerging (except possibly a cf version disrepancy), but even if the source was manipulated to be okay for the conservative regridding, the bounds on the output grid are not in the right form (they are on the 'projection_{x, }_coordinate' dimension coordinates in units of metres, rather than the auxiliary coordinates in consistent units with the input data bounds) for it to be possible to regrid using the conservative method to that output grid. Both source and destination grids requiring contiguous, non-overlapping bounds for conservative regridding.

I hope that makes sense and that it helps us to move forward at least in the discussion. Please do let us know what outputs you get for the environment calls and the output of the script as asked above, also.

clairbarnes commented 3 years ago

Thanks for the response @sadielbartholomew. I tried running the regridding over just the CDLs, but I still get an error:

>>> import cf
>>> f = cf.read('data_in.cdl', fmt='CDL')[0]
>>> g = cf.read('grid_out.cdl', fmt='CDL')[0]
>>> f_onto_itself = f.regrids(f, method='conservative')
/apps/jasmin/jaspy/miniconda_envs/jaspy3.7/m3-4.6.14/envs/jaspy3.7-m3-4.6.14-r20200606/lib/python3.7/site-packages/cf/regrid.py:209: UserWarning: Warning: converting a masked element to nan.
  tmp_x[n, m] = x_bounds[-1, -1, 2]
/apps/jasmin/jaspy/miniconda_envs/jaspy3.7/m3-4.6.14/envs/jaspy3.7-m3-4.6.14-r20200606/lib/python3.7/site-packages/cf/regrid.py:215: UserWarning: Warning: converting a masked element to nan.
  tmp_y[n, m] = y_bounds[-1, -1, 2]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/apps/jasmin/jaspy/miniconda_envs/jaspy3.7/m3-4.6.14/envs/jaspy3.7-m3-4.6.14-r20200606/lib/python3.7/site-packages/cf/decorators.py", line 59, in precede_with_kwarg_deprecation_check
    operation_method_result = operation_method(self, *args, **kwargs)
  File "/apps/jasmin/jaspy/miniconda_envs/jaspy3.7/m3-4.6.14/envs/jaspy3.7-m3-4.6.14-r20200606/lib/python3.7/site-packages/cfdm/decorators.py", line 47, in inplace_wrapper
    processed_copy = operation_method(self, *args, **kwargs)
  File "/apps/jasmin/jaspy/miniconda_envs/jaspy3.7/m3-4.6.14/envs/jaspy3.7-m3-4.6.14-r20200606/lib/python3.7/site-packages/cf/field.py", line 19327, in regrids
    new_data = Data.reconstruct_sectioned_data(sections)
  File "/apps/jasmin/jaspy/miniconda_envs/jaspy3.7/m3-4.6.14/envs/jaspy3.7-m3-4.6.14-r20200606/lib/python3.7/site-packages/cf/data/data.py", line 8919, in reconstruct_sectioned_data
    ndims = len(list(sections.keys())[0])
IndexError: list index out of range

I've attached the cf.environment() output in case that sheds any light on the issue. However, I think that my problem may go beyond a version discrepancy. A closer look at the actual values in the auxiliary coordinate bounds of the source file shows that although the data are produced on a regular grid in rotated lat-lon coordinates, the bounds of which should define the vertices of contiguous non-overlapping cells when transformed into normal lat-lon coordinates, that isn't always the case. Some cells don't share two vertices with their neighbours (I think probably due to rounding errors in the conversion to regular lat-lon) and the edge cells are a completely different shape. This makes me think that cf's requirement for the whole field to be contiguous is probably the root of my problem.

I suspect that I might be able to fix this by recalculating the vertices and overwriting the auxiliary coordinate bounds - but since I need to regrid output from several different RCMs with slightly different grids, this probably isn't a very practical solution. As a workaround I've now successfully used CDO to conservatively remap the same input data onto a test grid (although not onto my target grid - that's phase two). Presumably this is because it has a less strict requirement on contiguousness of the input data, so I think my best bet at this point is probably to use CDO to do the remapping.

Thanks again for all your help trying to get to the bottom of this issue!

Clair

sadielbartholomew commented 3 years ago

Hi Clair, no problem and sorry it is taking a while to get to the bottom of this.

I tried running the regridding over just the CDLs, but I still get an error:

Sorry this has made me realise I should have asked you to run those two snippets of code I put using the fields as you read them in from your original netCDF rather than those CDL files. You are getting the error you are getting with the CDL because they were from ncdump -h and so only had header info and not any data, as is obviously required for regridding (I'll register the not very helpful error message in that case as really it should error with a message that says as much.)

If you try running those snippets again but this time instead of calling f = cf.read('data_in.cdl', fmt='CDL')[0] call f = cf.read('<name of original source netCDF file>.nc')[0], and the equivalent for your desired destination g, in other words read in your original datasets in rather than the CDL which was all I had available to test on, it should reveal what I was hoping to check.

Thanks for sharing the environment, it actually looks fine and there is nothing there that would suggest why you might be seeing the error you are, so my original suspicion that it was to do with versions is wrong.

I also now realise the specific error you run into probably requires the data coordinates to be present, so the header-only CDL would not show it up. I asked for ncdump -h before because I thought with the data there in the output of a regular ncdump the CDL might end up being very large, but in asking for that I've probably missed the issue at hand, hence why I can't recreate it, especially with the further info you state:

Some cells don't share two vertices with their neighbours (I think probably due to rounding errors in the conversion to regular lat-lon) and the edge cells are a completely different shape. This makes me think that cf's requirement for the whole field to be contiguous is probably the root of my problem.

Yes, in light of the original error you reported (which I was trying to recreate), I think you are right about that, especially since you stated that linear regridding worked for the same case, where there isn't such a requirement. After Easter I will seek a second opinion from David (the cf-python expert) to confirm, though.

I'm glad you've found somewhat of a workaround, though it's a shame you had to resort to using CDO and we don't provide in cf-python some straightforward way to conservatively regrid via "less strict requirement on contiguousness of the input data".

After Easter I'll also look into, and chat to David about, the possibility, of providing in future versions ways to do conservative regridding with relaxed bounds requirements and/or to indicate some assumption is applicable such that the regridding can be achieved yb making some approximation on the bounds. Under the hood, cf-python uses ESMF via ESMPy as a Python wrapper for regridding so it may depend on the support provided by either of those as to whether we can do something like that. If we do think there's scope for something along those lines I'll let you know via a new comment on there.

Thanks, Sadie