NOAA-ORR-ERD / gridded

A single API for accessing / working with gridded model results on multiple grid types
https://noaa-orr-erd.github.io/gridded/index.html
The Unlicense
64 stars 14 forks source link

Preserve names in grid roundtrip. #66

Open groutr opened 3 years ago

groutr commented 3 years ago

When a grid is loaded and then saved with gridded, all of the existing names of variables and dimensions get overwritten with generated names.

This PR introduces a strategy to preserve those names by recording them in a dictionary structure. This is still a WIP, but comes from #65 discussion.

Some of the simplified, but incredibly useful ways to use this mapping:

# The mesh_1d object was generated from 1d object http://ugrid-conventions.github.io/ugrid-conventions/#1d-network-topology
# If we want to know what an input name maps to in terms of the spec:
mesh_1d.get_attribute("Mesh1_edge_y")   # Returns 'edge_coordinates'
# If we want to look up what an element of the spec maps to in the input netcdf
mesh_1d['edge_coordinates']  # Returns ['Mesh1_edge_x', 'Mesh1_edge_y']

# If we want to get the values of the edge coordinates of the mesh
mesh_1d.get_values("edge_coordinates", grid_1d)  # Returns grid_1d.edge_coordinates
mesh_1d.get_values("Mesh1_edge_y", grid_1d)  # Returns grid_1d.edge_coordinates[:,1]
ChrisBarker-NOAA commented 3 years ago

As this is a WIP, I may be missing something, but my thoughts as to where you are going:

The UGrid object is all about capturing the data model inherent in the UGRID spec. However, it is intended to be independent of netcdf itself -- able to be created, used, saved with no files, or other file formats, or .... But the code as it stands is a bit entangled with netCDF, and I've really been meaning to refactor the netcdf IO code.

I was thinking that this approach was overdoing it a bit -- re-implementing what is in the UGrid object already (or including stuff that is inherent in the data model). But if we think of it as taking everything that is specific to netCDF (dimensions, for instance) and putting that in a separate class (or set of classes) then this does start to make more sense.

So I want to see where this is going -- how do you use this to load or save a UGrid object?

A few goals to keep in mind:

1) As the PR name says, we want a netCDF file to round-trip through UGRid with little (or no) changes -- i.e. preserving the variable names. so that's one goal.

2) You should be able to create a UGrid object from "scratch", and then save it out to netCDF, without having to specify anything extra (i.e. all variable and dimension names should be optionally auto-generated.

3) remember that there could be more than one mesh in a single netcdf file -- at least in theory. this is not the lest bit well tested, but good to keep in mind.

4) My idea for refactoring of the loading from netcdf code was to make it a two-step process: a) examine the netCDF file, and figure out what all the variables mean b) actually load the UGrid from the file

The idea here is that if you have a non-compliant file, you can do step (a) by hand (or some other way). This would require an intermediate representation of the mapping between variable names and UGRid "parts" -- so your approach here might work really well. The trick, however is that there might need to be some processing in there somehow (if a part of the grid is represented in another way -- i.e. more needs to be done than to specify the variable names.

Overall design philosophy: I agree with the "zen"'s axiom: "flat is better than nested" -- so keep that in mind. For instance there may not be a need for a Dimension class -- it really doesn't hold much -- just a thought to keep in mind.

Side note: We may want to, sooner than later, use xarray as the interface to netcdf, and other file formats. xarray matches the netcdf data model, but there may be some differences to keep in mind. If you want, you could go to xarray first. (that might actually make it less disruptive -- it would be all in the "xarray" loader/saver, leaving netCDF untouched :-)

Final point -- I think we can go all Python3 at this point >= 3.8 seems reasonable.