COSIMA / cosima-cookbook

Framework for indexing and querying ocean-sea ice model output.
https://cosima-recipes.readthedocs.io/en/latest/
Apache License 2.0
57 stars 27 forks source link

Degeneracy in variable name #330

Open aidanheerdegen opened 9 months ago

aidanheerdegen commented 9 months ago

While looking for a mapping from variable name to long_name, standard_name and units there are some troubling inconsistencies

https://github.com/ACCESS-NRI/experiment_metadb/issues/3#issuecomment-1728884698

The variables table in the database has the following schema

CREATE TABLE variables (
        id INTEGER NOT NULL, 
        name VARCHAR NOT NULL, 
        long_name VARCHAR, 
        standard_name VARCHAR, 
        units VARCHAR, 
        PRIMARY KEY (id)
);
CREATE INDEX ix_variables_name ON variables (name);
CREATE UNIQUE INDEX ix_variables_name_long_name_units ON variables (name, long_name, units);

Arguably this should also have an index columns for model and realm in case of variable name clashes between sub-models and models. In the original conception of the database it was only storing COSIMA data, so the same model and AFAIK there were no variable name overlaps between CICE and MOM5.

However if there are any other experiment types stored in the DB it may lead to more possibility of variable name clashes.

If you look for instances of multiple variable names with different definitions there are some troubling examples

sqlite> select * from variables where name not like "%time%" and name in (select name from variables group by name having count(*) > 1);
...
802|vh|Meridional Thickness Flux||m3 s-1
161|vh|Meridional thickness flux||m3 s-1
...
932|zoo|||
515|zoo|zoo||mmol/m^3
698|zoo|zoo||none
897|zoo|zooplankton||mmol/m^3

So vh is defined with slightly different long names!? How does that happen?

There are four different distinct versions of zoo (zooplankton) variables? How does this happen?

aidanheerdegen commented 9 months ago
Here are some examples of the four different zoo variables id path
897 /g/data/ik11/outputs/access-om2/1deg_iamip2_his/output056/ocean/oceanbgc-3d-zoo-1-yearly-mean-y_2014.nc
515 /g/data/ik11/outputs/access-om2/1deg_jra55_iaf_omip2_cycle5/output288/ocean/ocean_bgc_ann.nc
698 /g/data/ik11/outputs/access-om2-025/025deg_jra55_ryf9091_bgc/restart050/ocean/csiro_bgc.res.nc
932 /g/data/ik11/outputs/access-om2/1deg_iamip2_CMCC-ESM2ssp126/restart070/ice/csiro_bgc.res.nc

The latter two are restart files, though it's a bit odd one is in the ice subdirectory, and the other is in ocean.

The first two are a bit of a mystery. Was there a code update for the 1deg_iamip2_his experiment? Looks like it was done with a bespoke build by @hakaseh:

https://github.com/hakaseh/1deg_jra55_iaf/blob/iamip2-his/manifests/exe.yaml#L15

The query for this:

select variables.id, variables.name, experiment, root_dir, ncfile 
from experiments  
        join ncfiles on experiments.id = ncfiles.experiment_id 
        join ncvars on ncvars.ncfile_id = ncfiles.id 
        join variables on  ncvars.variable_id = variables.id 
where variables.name = 'zoo';
aidanheerdegen commented 9 months ago

@aekiss should potential temperature and conservative temperature have different variable names? Or are they the same at the surface?

792|surface_temp|Conservative temperature|sea_surface_conservative_temperature|K
1453|surface_temp|Conservative temperature||deg_C
1618|surface_temp|Potential temperature|sea_surface_temperature|degrees K
aekiss commented 9 months ago

Potential and conservative temperature are different at the surface, so yes they should have distinct names.

aidanheerdegen commented 9 months ago

Just talked to Andrew, and apparently with MOM you can choose to have potential or conservative temperature as the prognostic variable, but the actual variable name does not change, though the long name will differ.

This is unfortunate for people who want to create databases mapping variable names to long names, standard names and units.

This means such look up tables have to be experiment specific AFAICT. Doh.

hakaseh commented 9 months ago

Here are some examples of the four different zoo variables

id path 897 /g/data/ik11/outputs/access-om2/1deg_iamip2_his/output056/ocean/oceanbgc-3d-zoo-1-yearly-mean-y_2014.nc 515 /g/data/ik11/outputs/access-om2/1deg_jra55_iaf_omip2_cycle5/output288/ocean/ocean_bgc_ann.nc 698 /g/data/ik11/outputs/access-om2-025/025deg_jra55_ryf9091_bgc/restart050/ocean/csiro_bgc.res.nc 932 /g/data/ik11/outputs/access-om2/1deg_iamip2_CMCC-ESM2ssp126/restart070/ice/csiro_bgc.res.nc The latter two are restart files, though it's a bit odd one is in the ice subdirectory, and the other is in ocean.

I agree that it is odd that csiro_bgc.res.nc is saved in both ice and ocean subdirectories. Only one is needed.

The first two are a bit of a mystery. Was there a code update for the 1deg_iamip2_his experiment? Looks like it was done with a bespoke build by @hakaseh:

https://github.com/hakaseh/1deg_jra55_iaf/blob/iamip2-his/manifests/exe.yaml#L15

I didn't remember changing the longnames, but looking at the commit history, it looks like they were added by @aekiss:

https://github.com/hakaseh/1deg_jra55_iaf/commit/7deb65a28d8db15fac57548b242b67ad46ab48dd

The query for this:

select variables.id, variables.name, experiment, root_dir, ncfile 
from experiments  
        join ncfiles on experiments.id = ncfiles.experiment_id 
        join ncvars on ncvars.ncfile_id = ncfiles.id 
        join variables on  ncvars.variable_id = variables.id 
where variables.name = 'zoo';