NOAA-ORR-ERD / gridded

A single API for accessing / working with gridded model results on multiple grid types
https://noaa-orr-erd.github.io/gridded/index.html
The Unlicense
64 stars 14 forks source link

"filename" and "data_file" inconsistent #33

Open ChrisBarker-NOAA opened 5 years ago

ChrisBarker-NOAA commented 5 years ago

In the Time object, it's called "filename", in the Variable object, it's called "data_file".

it should be consitent -- I like data_file better, but I don't really care.

jay-hennen commented 5 years ago

It is kinda consistent already. 'filename' is the 'primary' keyword for giving a file or list of files to any object in Gridded (ALL objects CAN take this keyword and it'll work like you expect). Where it gets a bit trickier is that 'compound' objects like Variable (which have a Grid and Time and data) may be built from discrete files for grid and data. In this case, 'filename' is the go-to for any unspecified '*_file' keyword

Variable.fromnetCDF(filename=X) (Grid uses X, Time uses X, data uses X)_ Variable.from_netCDF(filename=X, grid/datafile=Y) (Time+data use the same file. Grid uses the other)_ Variable.from_netCDF(filename=X, grid_file=X, datafile=Y) (I think this actually errors on you)_ Variable.from_netCDF(grid_file=X, datafile=X) (Grid uses X, Time uses X, data uses X)_ Variable.from_netCDF(grid_file=X, datafile=Y) (Grid uses X, Time uses Y, data uses Y)_

ChrisBarker-NOAA commented 5 years ago

hmm -- why does a variable need to (directly) know what file its gird came from? Wouldn't that be a Grid attribute? So you may want to specify a grid_file when construct it, but it doesn't need to keep it around.

In the Variable case, I think that "filename" should be the file that that data comes from, always.

I'm also not sure we need to be able to construct a Variable, iwth a grid in one pass by itself anyway.

For the most part, folks are going to use gridded.Dataset to manage the relationship.

jay-hennen commented 5 years ago

Variable doesn't actually retain 'grid_file' or 'data_file' as attributes. It has a .filename attribute, which is where it's data comes from. 'grid/data_file' are purely keywords for the case when you want to generate the Variable, Grid, and Time all in one go.

ChrisBarker-NOAA commented 5 years ago

OK, then that's what I'd expect -- I'll go look again at why I thought this was an issue.

ChrisBarker-NOAA commented 5 years ago

Now I know why -- the from_netcdf consructor has a filename atribute, but the class __init__ does not.

So filename is not getting set.

I just added:

self.filename = data_file

to the init -- we'll see if that breaks anything.

It seems that that init should have a "filename" and "grid_file", and then:

grid_file = filename if grid_file is None else grid_file

But I don't want to make that change, as it could break all sorts of things.

OK -- adding that filename code didn't break anything.

PS: we really need more tests!