RDCEP / EDE

MIT License
2 stars 1 forks source link

NetCDF Metadata in models.py #6

Closed njmattes closed 8 years ago

njmattes commented 8 years ago

These are merely questions or observations—I'm not sure the best way to handle most of this.

ghost commented 8 years ago

let me add here:

ghost commented 8 years ago

TODO for myself:

njmattes commented 8 years ago

Yes, you're totally right that we should stick to fixing NetCDF_Meta now, and wait for NetCDF_CleanMeta until we actually need it.

Another question: are units stored in NetCDF_Meta already? Are those in vars_attrs perhaps?

ghost commented 8 years ago

yes, vars_attrs contains the variables' attributes in the order of the variables and the key first, then followed by the value, i.e.

var_1_key_1, var_1_val_1, var_1_key_2, var_1_val_2, var_1_key_3, var_1_val_3, .... , var_2_key_1, var_2_val_1, var_2_key_2, var_2_val_2, var_2_key_3, var_2_val_3, .... , ....

and the key-value pair(s) for units would be among them. however, i have to point out that the metadata stored in netcdf_meta as it is right now does not allows us to that much efficiently locate the value for a units key.

to get the value of a units key for a particular variable we would have to do the following with the current metadata

  1. display vars_names to the user and he can decide on a specific var among them
  2. loop over vars_names to find the index of that var, let's call it var_index
  3. sum up all entries in vars_attrs_nums up to and excluding index var_index to get, let's call it say, var_attr_index, which is the index of vars_attrs where the key-value attribute pairs of var start, also save vars_attrs_nums[var_index] as say var_num_attrs
  4. loop over vars_attrs starting at var_attr_index and going to at most var_attr_index + var_num_attrs - 1 and search for an entry that equals units and finally return the immediate next entry (which is going to be the value of the units key-value pair for the given variable var the user was interested in)

this might look cumbersome, however, we can of course implement steps 2-4 as a stored procedure in order to avoid sending multiple requests to Postgres. because of that, and also because we usually don't have that many variables (and less importantly that many key-value attributes for a fixed variable) the above procedure shouldn't take too long.

in any case we can of course always optimize later. but at least we would be able to avoid sending multiple requests by using a stored procedure to implement the above and we would likely be using similar stored procedures to implement other convenient metadata searches.