esm-tools / pymorize

A Python based Tool to CMORize NetCDF Data
https://pymorize.readthedocs.io
MIT License
2 stars 1 forks source link

Dimensionless Units #55

Closed pgierz closed 17 hours ago

pgierz commented 2 weeks ago

Work for dimensionless units. Will close #54

pgierz commented 2 weeks ago

For developers: the unitless mapping table is read when the CMORizer object gets initialised (either directly, this is uncommon) or via the classmethod from_dict (this is the common case), and attached to all rules that the CMORizer knows about. You can access them on the rule object as attribute dimensionless_unit_mappings.

siligam commented 2 weeks ago

@pgierz , nice to see the dimensionless_unit_mappings set on rule. I guess unit conversion function should looked out for this attribute and act accordingly. If so, I will make necessary changes to that function. One follow up question: Should the units in the final output point to units in the CMIP table or the one used in this unit_mapping? For instance, "0.001" is the unit in the table. Its unit_mapping says "g/kg". The later is used by the unit conversion function. I guess the unit representation in the final output should still be "0.001" as suggested in the table. Right?

pgierz commented 2 weeks ago

I guess the unit representation in the final output should still be "0.001" as suggested in the table. Right?

Correct. At the end of the CMOR process, we need to have metadata attributes as defined in the CMOR Tables.

mandresm commented 2 weeks ago

Hi guys, just so I make sure I am following, this are the steps and tooling needed correct?

the unitless mapping table is read when the CMORizer object gets initialised (either directly, this is uncommon) or via the classmethod from_dict (this is the common case), and attached to all rules that the CMORizer knows about

Any step missing?

I'd suggest we start writing the dimensionless_unit_mappings yaml with just one variable to test, for example one of the salinities that @siligam have shown from his tables.

pgierz commented 2 weeks ago
  1. read the dimensionless_unit_mappings and assign them to their corresponding CMOR variables. @pgierz this is also what you are saying here as well right?

    the unitless mapping table is read when the CMORizer object gets initialised (either directly, this is uncommon) or via the classmethod from_dict (this is the common case), and attached to all rules that the CMORizer knows about

Partially. The mappings are read and all are assigned to each Rule: no filtering is done. If you want each Rule to only know about one specific mapping, that is easy enough to do, I can add that.

pgierz commented 2 weeks ago

I'd suggest we start writing the dimensionless_unit_mappings yaml with just one variable to test, for example one of the salinities that @siligam have shown from his tables.

Probably it would be a good idea to turn that into a unit test so it is run automatically.

siligam commented 2 weeks ago

I will implement the tastcase for salinity using the dimensionless_unit_mappings feature.

Just for reference, model-table unit listing https://notes.desy.de/CXe0axlgSbStWgOLjaXnlQ?view

mandresm commented 2 weeks ago
  1. read the dimensionless_unit_mappings and assign them to their corresponding CMOR variables. @pgierz this is also what you are saying here as well right?

the unitless mapping table is read when the CMORizer object gets initialised (either directly, this is uncommon) or via the classmethod from_dict (this is the common case), and attached to all rules that the CMORizer knows about

Partially. The mappings are read and all are assigned to each Rule: no filtering is done. If you want each Rule to only know about one specific mapping, that is easy enough to do, I can add that.

I'm not understanding something: don't the rules belong to the model side of things?

Why would we need to append it to the rule and not to the CMOR object? The mapping is for the CMOR unit, not the model unit. Please guys, let me know if I need to reread the documentation or the code, I might be saying bullshit here

pgierz commented 2 weeks ago

Please guys, let me know if I need to reread the documentation or the code, I might be saying bullshit here

This is badly documented, so read the code ;-) I am talking about the CMORizer object here (the main controller) which has information about everything It's the part that controls the parallelisation and knows which rules exist, what the user configuration looks like, how to match up model files to cmor table entries etc etc.

At the moment, it looks like this:

>>> user_config_file = pathlib.Path("/some/file.yaml")
>>> f = yaml.safe_load(user_config_file)
>>> cmorizer = CMORizer.from_dict(f)
>>> type(cmorizer.rules)
list
>>> first_rule = cmorizer.rules[0]
>>> first_rule.model_variable
"sst"
>>> first_rule.cmor_variable
"sos"
>>> first_rule.dimensionless_unit_mappings
{"sst": {"0.001": "g/kg"}, "other_key": {"1": "µmol mol^-1"}, ...}

Of course this example is nonsense, sst isn't a concentration, but just for illustrative purposes of what sorts of things are where in the objects.

christian-stepanek commented 2 weeks ago

I guess the unit representation in the final output should still be "0.001" as suggested in the table. Right?

Correct. At the end of the CMOR process, we need to have metadata attributes as defined in the CMOR Tables.

Yes, the units in the tables are the target, they are the convention.

siligam commented 2 weeks ago

So, I have integrated dimensionless_mappings logic into unit conversion function and it appears work as expected. I have run examples/sample.yaml which produced the following:

The slurm log shows the expected unit conversion details:

> grep Converting slurm-13789839.out
Converting units: (thetao -> thetao) degC -> degC
Converting units: (CO2f -> fgco2) mmolC/m2/d -> kg m-2 s-1
Converting units: (so -> so) psu -> g/kg (0.001)

The units in processed netcdf file shows the correct unit as per the Table.

> ncdump -h so_Omon_AWI-ocean_piControl_r1i1p1f1_gn_278401-278412.nc | grep so:units
        so:units = "0.001" ;
pgierz commented 1 week ago

What I am still missing here is a unit test for the dimensionless unit conversion.

pgierz commented 1 week ago

I'm going to have to do a force push on this branch at some point, there are a few features mixed together in this one:

  1. ~disentangling of the parallel initialisation~
  2. ~validate vs. post_init~
  3. Actual unitless conversion.

~@mandresm: is it worth it, just to keep things "clean"? Or more confusing?~

Update: No force push needed, changes in this PR now exclusively deal with the relevant conversion logic.

christian-stepanek commented 6 days ago

Regarding getting unit information like g/kg in addition to the fractional scalar: I had a look. The physical units of a quantity appear to be hidden deep down in the details of physical parameters (https://bit.ly/CMIP-DR-physical-parameters). The Variables defined in the CMOR tables then refer to the different physical parameters and inherit their unit information.

Commenting on individual physical parameters is possible via this form: https://airtable.com/appqRFkdpwAitEZNY/pagBlOWjK6JfvHtzU/form

Doing so is possible, but with significant work per parameter. Are we still confident that this additional information is necessary for a proper workflow? If so, I will write to CMIP-IPO to see what they can and are willing to do from a more fundamental level. But at the moment it does not appear to me to boil down to just adding another comment or field to the data base at hand.

This is just the way I understand the information on airtable - let me know if you have a different understanding.

mandresm commented 6 days ago

Thanks @christian-stepanek for looking into this!

The Variables defined in the CMOR tables then refer to the different physical parameters and inherit their unit information.

Meaning that for example albsrfc will inherit 1 as unit right?

Doing so is possible, but with significant work per parameter. Are we still confident that this additional information is necessary for a proper workflow? If so, I will write to CMIP-IPO to see what they can and are willing to do from a more fundamental level. But at the moment it does not appear to me to boil down to just adding another comment or field to the data base at hand.

This is just the way I understand the information on airtable - let me know if you have a different understanding.

In my view this is not additional information, is just that 1 or 0.001 is not a unit, IMO it's fundamentally wrong, and programatically imprecise. If anything 1 speaks of a range (from 0 to 1) and 0.001 speaks of a ratio. This is not what I expect from a standard for doing international model comparisons. I am aware I am extremely picky though and I might not be getting the full picture here...

Is it possible to ask their opinion to other members of the community before starting to formally discuss in airtable? Or is airtable the correct place to open the discussion?

In practical terms for pymorize, it is very little work for us to provide the mapping functionality, so I don't see a big technical problem for us.

mandresm commented 6 days ago

@pgierz, does this commit make sense to you or is there a better way of doing this keeping the data directory in the root directory? bc5586a

mandresm commented 6 days ago

Blocked by #66

mandresm commented 3 days ago

Tests are failing, I'll look into that on Monday