Open groutr opened 3 years ago
As this is a WIP, I may be missing something, but my thoughts as to where you are going:
The UGrid object is all about capturing the data model inherent in the UGRID spec. However, it is intended to be independent of netcdf itself -- able to be created, used, saved with no files, or other file formats, or .... But the code as it stands is a bit entangled with netCDF, and I've really been meaning to refactor the netcdf IO code.
I was thinking that this approach was overdoing it a bit -- re-implementing what is in the UGrid object already (or including stuff that is inherent in the data model). But if we think of it as taking everything that is specific to netCDF (dimensions, for instance) and putting that in a separate class (or set of classes) then this does start to make more sense.
So I want to see where this is going -- how do you use this to load or save a UGrid object?
A few goals to keep in mind:
1) As the PR name says, we want a netCDF file to round-trip through UGRid with little (or no) changes -- i.e. preserving the variable names. so that's one goal.
2) You should be able to create a UGrid object from "scratch", and then save it out to netCDF, without having to specify anything extra (i.e. all variable and dimension names should be optionally auto-generated.
3) remember that there could be more than one mesh in a single netcdf file -- at least in theory. this is not the lest bit well tested, but good to keep in mind.
4) My idea for refactoring of the loading from netcdf code was to make it a two-step process: a) examine the netCDF file, and figure out what all the variables mean b) actually load the UGrid from the file
The idea here is that if you have a non-compliant file, you can do step (a) by hand (or some other way). This would require an intermediate representation of the mapping between variable names and UGRid "parts" -- so your approach here might work really well. The trick, however is that there might need to be some processing in there somehow (if a part of the grid is represented in another way -- i.e. more needs to be done than to specify the variable names.
Overall design philosophy: I agree with the "zen"'s axiom: "flat is better than nested" -- so keep that in mind. For instance there may not be a need for a Dimension
class -- it really doesn't hold much -- just a thought to keep in mind.
Side note: We may want to, sooner than later, use xarray as the interface to netcdf, and other file formats. xarray matches the netcdf data model, but there may be some differences to keep in mind. If you want, you could go to xarray first. (that might actually make it less disruptive -- it would be all in the "xarray" loader/saver, leaving netCDF untouched :-)
Final point -- I think we can go all Python3 at this point >= 3.8 seems reasonable.
When a grid is loaded and then saved with gridded, all of the existing names of variables and dimensions get overwritten with generated names.
This PR introduces a strategy to preserve those names by recording them in a dictionary structure. This is still a WIP, but comes from #65 discussion.
Some of the simplified, but incredibly useful ways to use this mapping: