Closed njmattes closed 8 years ago
NetCDF_Meta
table, it's a little bit (not too much though) work (because of #7) to have this ID also in NetCDF_Data
(since we need to have it there in order todo joins)raster2pgsql
it already stores the spatial resolution for me in an extra postgis managed table called raster_columns
, a table that knows about all the columns of type raster in an entire database which brings us to the problem that in our case different rows in NetCDF_Data
come from different datasets and thus have different resolutions. thus, you are right, we need to store this resolution somewhere else, let us add it as an additional column field in NetCDF_Meta
raster2pgsql
to ingest a netcdf's data it just has a notion of bands, i.e. band 1, 2, 3, ... (which are the time frames, which raster2pgsql
found out automatically, so it has some cleverness) but it has no idea that a stepping of 1 actually means a day, week, month, or year, etc. yes, so i will add another column to NetCDF_Meta
for the temporal resolutionNetCDF_Meta
first and later, after having seen some dirtier netcdf's abstract out stuff into NetCDF_CleanMeta
let me add here:
netcdf_meta
TODO for myself:
NetCDF_Meta
as new columns, of course also add it to the corresponding SQLAlchemy objectid
column to NetCDF_Meta
and also add a column netcdf_meta_id
to NetCDF_Data
that's a foreign key to netcdf_meta.id
Yes, you're totally right that we should stick to fixing NetCDF_Meta
now, and wait for NetCDF_CleanMeta
until we actually need it.
Another question: are units stored in NetCDF_Meta
already? Are those in vars_attrs
perhaps?
yes, vars_attrs
contains the variables' attributes in the order of the variables and the key first, then followed by the value, i.e.
var_1_key_1, var_1_val_1, var_1_key_2, var_1_val_2, var_1_key_3, var_1_val_3, .... , var_2_key_1, var_2_val_1, var_2_key_2, var_2_val_2, var_2_key_3, var_2_val_3, .... , ....
and the key-value pair(s) for units would be among them.
however, i have to point out that the metadata stored in netcdf_meta
as it is right now does not allows us to that much efficiently locate the value for a units
key.
to get the value of a units
key for a particular variable we would have to do the following with the current metadata
vars_names
to the user and he can decide on a specific var
among themvars_names
to find the index of that var, let's call it var_index
vars_attrs_nums
up to and excluding index var_index
to get, let's call it say, var_attr_index
, which is the index of vars_attrs
where the key-value attribute pairs of var
start, also save vars_attrs_nums[var_index]
as say var_num_attrs
vars_attrs
starting at var_attr_index
and going to at most var_attr_index + var_num_attrs - 1
and search for an entry that equals units
and finally return the immediate next entry (which is going to be the value of the units
key-value pair for the given variable var
the user was interested in)this might look cumbersome, however, we can of course implement steps 2-4 as a stored procedure in order to avoid sending multiple requests to Postgres. because of that, and also because we usually don't have that many variables (and less importantly that many key-value attributes for a fixed variable) the above procedure shouldn't take too long.
in any case we can of course always optimize later. but at least we would be able to avoid sending multiple requests by using a stored procedure to implement the above and we would likely be using similar stored procedures to implement other convenient metadata searches.
These are merely questions or observations—I'm not sure the best way to handle most of this.
dataset_name
: Would it be better to use an automatically incremented integer for theprimary_key
? Eg, what if two files have the same name? Perhaps a deviant case, but very possible.@properties
of theNetCDF_Meta
class.NetCDF_CleanMeta
, in which variable names are standardized and stored. In this table we might also store things such as start and end date of the dataset (if we assume that users may want to search for datasets based on that info).dataset_name
inNetCDF_Data
should (I think) be a foreign key to the primary key ofNetCDF_Meta
. So, ifNetCDF_Meta
had a primary keyid
that was auto incremented,NetCDF_Data
might havenetcdf_meta_id = Column(Integer, ForeignKey('netcdf_meta.id'))
.