NOAA-ORR-ERD / gridded

A single API for accessing / working with gridded model results on multiple grid types
https://noaa-orr-erd.github.io/gridded/index.html
The Unlicense
64 stars 14 forks source link

How to access file with custom coordinate variable names? #24

Closed mattijn closed 6 years ago

mattijn commented 6 years ago

I've a netcdf that probably contains coordinates names that do not follow the netcdf-cdf convention. I see the following issue is closed: https://github.com/NOAA-ORR-ERD/gridded/issues/18

But I was wondering how I've to set up this dictionary to link the right variables? Currently when I do: ds = gridded.Dataset(file_nc)

It gives: ValueError: Node coordinates standard_name is neither "longitude" nor "latitude"

If I open file_nc with xarray it shows me the following names:

import xarray as xr
ds = xr.open_dataset(file_nc)
ds

<xarray.Dataset>
Dimensions:                      (Two: 2, max_nmesh2d_face_nodes: 5, nmesh2d_edge: 29441, nmesh2d_face: 16597, nmesh2d_node: 12845, time: 2)
Coordinates:
    mesh2d_node_x                (nmesh2d_node) float64 ...
    mesh2d_node_y                (nmesh2d_node) float64 ...
    mesh2d_edge_x                (nmesh2d_edge) float64 ...
    mesh2d_edge_y                (nmesh2d_edge) float64 ...
    mesh2d_face_x                (nmesh2d_face) float64 ...
    mesh2d_face_y                (nmesh2d_face) float64 ...
  * time                         (time) datetime64[ns] 2008-01-19T12:00:00 ...
Dimensions without coordinates: Two, max_nmesh2d_face_nodes, nmesh2d_edge, nmesh2d_face, nmesh2d_node

In gridded.Dataset() mentions I can include variables as follow:

:param variables: a dict of dataset.Variable objects -- or anything that
                  presents the same API.

When I try

gridded.Dataset(file_nc, variables={"longitude": "mesh2d_node_x",
                                    "latitude":"mesh2_nodex_y"
                                   })

It gives me the following error ValueError: You can create a Dataset from a file, or from raw databut not both.

Any help is much appreciated

jay-hennen commented 6 years ago

As noted in #18, the Dataset construction functionality isn't complete, so it can't handle non-compliant files yet. What exactly are you trying to get out of the file? If you only want the grid, then it's best to only create a Grid object from gridded.grids import Grid and use the Grid.from_netCDF function to specify how to read the file.

A grid_topology dict is generally along the format of: {'node_lon': 'mesh2d_node_x', 'node_lat': 'mesh2d_node_y', etc}

If you want an object that associates data with the grid, you want a from gridded.variable import Variable. Use the Variable.from_netCDF in a similar fashion. But the requirements are also higher to create a Variable; you must have some data that fits right on the grid. Best to just create a Grid if that's all you need

mattijn commented 6 years ago

Thanks for the swift response and the package in general!

I’ve got various variables in the nc file (didn’t showed it in the xarray info statement). So I will try both approaches to become little more familiar with the functionalities of the package and will report back what worked.

mattijn commented 6 years ago

Hm, not yet succeeding. The problem is that the standard_name of the coordinate variable is not following the convention:

Using xarray:

ds.mesh2d_node_x.attrs
OrderedDict([('units', 'm'),
             ('standard_name', 'projection_x_coordinate'),
             ('long_name', 'x-coordinate of mesh nodes'),
             ('mesh', 'mesh2d'),
             ('location', 'node')])

It should be longitude and not projection_x_coordinate. So trying the following with gridded:

from gridded.grids import Grid
Grid.from_netCDF(filename=file_nc, grid_type='UGrid', grid_topology={
                 'longitude': 'projection_x_coordinate', 'latitude': 'projection_y_coordinate'})

Also gives:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-30-777afd75ab6e> in <module>()
----> 1 Grid.from_netCDF(filename=file_nc, grid_type='UGrid',grid_topology={'longitude': 'projection_x_coordinate', 'latitude': 'projection_y_coordinate'})

c:\python35\lib\site-packages\gridded\grids.py in from_netCDF(filename, dataset, grid_type, grid_topology, _default_types, *args, **kwargs)
    419         compliant = Grid._find_topology_var(None, gf)
    420         if compliant is not None:
--> 421             c = Grid._load_grid(filename, cls, dataset)
    422             c.grid_topology = compliant.__dict__
    423         else:

c:\python35\lib\site-packages\gridded\grids.py in _load_grid(filename, grid_type, dataset)
    381         '''
    382         if issubclass(grid_type, UGrid):
--> 383             return grid_type.from_ncfile(filename)
    384         elif issubclass(grid_type, SGrid):
    385             ds = get_dataset(filename, dataset)

c:\python35\lib\site-packages\gridded\pyugrid\ugrid.py in from_ncfile(klass, nc_url, mesh_name, load_data)
    159         grid = klass()
    160         read_netcdf.load_grid_from_ncfilename(nc_url, grid,
--> 161                                               mesh_name, load_data)
    162         return grid
    163 

c:\python35\lib\site-packages\gridded\pyugrid\read_netcdf.py in load_grid_from_ncfilename(filename, grid, mesh_name, load_data)
    308 
    309     with netCDF4.Dataset(filename, 'r') as nc:
--> 310         load_grid_from_nc_dataset(nc, grid, mesh_name, load_data)

c:\python35\lib\site-packages\gridded\pyugrid\read_netcdf.py in load_grid_from_nc_dataset(nc, grid, mesh_name, load_data)
    224                 nodes[:, 0] = var[:]
    225             else:
--> 226                 raise ValueError('Node coordinates standard_name is neither '
    227                                  '"longitude" nor "latitude" ')
    228         setattr(grid, defs['grid_attr'], nodes)

ValueError: Node coordinates standard_name is neither "longitude" nor "latitude" 

Attached is the file that I work with (zipped version is 4.3Mb) --> 001_map.zip

jay-hennen commented 6 years ago

The grid_topology kwarg is used to directly specify what netCDF variables get associated with the various attributes of a Grid. If you will later access the data using say, grid.node_lon then node_lon is the key you should use in the grid_topology. So in your case, the grid topology would be {'node_lon': 'mesh2d_node_x', 'node_lat': 'mesh2d_node_y'}

If you have a netCDF file that describes a grid and uses CF compliant names, the idea is that Gridded can use the standard names to search for and use the so-named variables in the file. If your file does not use CF, ugrid, or sgrid compliant names, then you need to use the grid_topology kwarg to specify it directly. And since you're already specifying it directly, then you may as well point directly to the name of the data object in the file and not a secondary 'standard_name' or such.

mattijn commented 6 years ago

Sorry for not saying, but I tried that first. You'll get the same ValueError. Also in the error statement it mentions it maps on the standard_name.

c:\python35\lib\site-packages\gridded\pyugrid\read_netcdf.py in load_grid_from_nc_dataset(nc, grid, mesh_name, load_data)
    224                 nodes[:, 0] = var[:]
    225             else:
--> 226                 raise ValueError('Node coordinates standard_name is neither '
    227                                  '"longitude" nor "latitude" ')

Since I've got more CF compliant netCDF files I checked these coordinate variables and attributes as well since these works seamless with gridded using ds = gridded.Dataset(file_nc).

And these dimensions variables looks as follow:

<xarray.Dataset>
Dimensions:             (Two: 2, max_nmesh2d_face_nodes: 4, nmesh2d_edge: 108598, nmesh2d_face: 53597, nmesh2d_node: 54925, time: 33)
Coordinates:
    mesh2d_node_x       (nmesh2d_node) float64 ...
    mesh2d_node_y       (nmesh2d_node) float64 ...
    mesh2d_edge_x       (nmesh2d_edge) float64 ...
    mesh2d_edge_y       (nmesh2d_edge) float64 ...
    mesh2d_face_x       (nmesh2d_face) float64 ...
    mesh2d_face_y       (nmesh2d_face) float64 ...
  * time                (time) datetime64[ns] 2017-11-01 2017-11-01T06:00:00 ...
Dimensions without coordinates: Two, max_nmesh2d_face_nodes, nmesh2d_edge, nmesh2d_face, nmesh2d_node

Apart from the dimensions it is identical to my non-working netCDF file. But if I check the attribute information I noticed that the standard_name and units are different with my non-working netCDF file:

OrderedDict([('units', 'degrees_east'),
             ('standard_name', 'longitude'),
             ('long_name', 'x-coordinate of mesh nodes'),
             ('mesh', 'mesh2d'),
             ('location', 'node')])

Now looking to the source code, https://github.com/NOAA-ORR-ERD/gridded/blob/master/gridded/pyugrid/read_netcdf.py#L198:L227

        for var in coord_vars:
            try:
                standard_name = var.standard_name
            except AttributeError:
                # CF does not require a standard name, look in units, instead.
                try:
                    units = var.units
                except AttributeError:
                    msg = ("The {} variable doesn't contain units "
                           "attribute: required by CF").format
                    raise ValueError(msg(var))
                # CF accepted units attributes for longitude.
                if units in ('degrees_east', 'degree_east', 'degree_E',
                             'degrees_E', 'degreeE', 'degreesE'):
                        standard_name = 'longitude'
                # CF accepted units attributes for longitude.
                elif units in ('degrees_north', 'degree_north', 'degree_N',
                               'degrees_N', 'degreeN', 'degreesN'):
                        standard_name = 'latitude'
                else:
                    msg = ("{} variable's units value ({}) doesn't look "
                           "like latitude or longitude").format
                    raise ValueError(msg(var, units))
            if standard_name == 'latitude':
                nodes[:, 1] = var[:]
            elif standard_name == 'longitude':
                nodes[:, 0] = var[:]
            else:
                raise ValueError('Node coordinates standard_name is neither '
                                 '"longitude" nor "latitude" ')

It mentions it doesn't need standard_name, but if the file contains a standard_name and it is not longitude or latitude it will raise the ValueError. If my netCDF file did not had a standard_name variable gridded would still map the file wrongly as the units of my non-CF compliant file is m, where my CF-compliant units are degrees_east..

So using grid_topology I can map the coordinate variable, but then the attributes are checked and there it is going wrong. Should be nice if there is a parameter in Grid.from_netCDF() where these longitude and latitude can be mapped similar to the grid_topology parameter.

jay-hennen commented 6 years ago

You can pass grid_topology into Grid.from_netCDF, if I'm not mistaken. Have you tried that? If that doesn't work, post the grid_topology dict you used and the file and I'll take a look.

Alternatively, you could try to construct the Grid_U object directly with the constructor. It takes kwargs the same as all it's attributes.

mattijn commented 6 years ago

I tried:

Grid.from_netCDF(filename=file_nc, grid_type='UGrid', grid_topology={
    'node_lon': 'mesh2d_node_x', 'node_lat': 'mesh2d_node_y'})

And:

Grid.from_netCDF(filename=file_nc, grid_type='UGrid', grid_topology={
    'longitude': 'projection_x_coordinate', 'latitude': 'projection_y_coordinate'})

The file I tried is here: https://github.com/NOAA-ORR-ERD/gridded/files/1903690/001_map.zip (nc-file is in the zip)

jay-hennen commented 6 years ago

Oh, that's interesting. The file does have a valid topology variable, and it seems like it should work aside from the standard name being incorrect. Looks like that name check should be disabled or is being used in the wrong situation.

In the meantime, you should still be able to proceed if you construct the grid object directly:

from gridded.grids import Grid_U
grid = Grid_U(node_lon=dataset['mesh2d_node_x'], node_lat=['mesh2d_node_y'], faces=...)
mattijn commented 6 years ago

While checking pyugrid, I noticed that since Nov 21, 2016 projected coordinates are supported:

if standard_name in {'latitude', 'projection_y_coordinate'}:
    nodes[:, 1] = var[:]
elif standard_name in {'longitude', 'projection_x_coordinate'}:
    nodes[:, 0] = var[:]
else:
    raise ValueError('Node coordinates standard_name is neither'
                     ' "longitude" nor "latitude" nor '
                     '"projection_x_coordinate" nor '
                     ' "projection_y_coordinate"')

Seems somehow this commit missed integration into the gridded package?

jay-hennen commented 6 years ago

Oh, good find! You appear to be correct.

Before that commit we basically hard forked pyugrid and pysgrid into the gridded package, with the intent to deprecate both. This was because we needed direct control to change the API of both libraries to bring them closer together. Unfortunately, it means the trickle of development that goes into pyugrid/pysgrid does get missed until we manually port it in.

ChrisBarker-NOAA commented 6 years ago

Thanks for following up on this Jay -- I"m taking a look now, and indeed, this is a missing feature.

I had tried to pull in the latest pyugrid to gridded -- but must have missed that.

Stay tuned....

ChrisBarker-NOAA commented 6 years ago

I just pushed changes that merge in the latest ugrid:

https://github.com/NOAA-ORR-ERD/gridded/commit/a685f721382a5c6dc417074a9baa03c5adfe71bf

However, this data file has data on the edges, which apparently we never wrote the code to support.

I've got xfail tests in there, and will update it some day -- but PR's welcome :-)

mattijn commented 6 years ago

Thanks Chris! We tried the updates and we can access the coordinates and variables. Currently we just focus on accessing the faces and that works fine. We are still on the 'usage' side, but if we are able to do a PR we will do!