ESMValGroup / ESMValCore

ESMValCore: A community tool for pre-processing data from Earth system models in CMIP and running analysis scripts.
https://www.esmvaltool.org
Apache License 2.0
42 stars 38 forks source link

Missing frequency using CMIP6 and obs4mips datasets #38

Closed tomaslovato closed 5 years ago

tomaslovato commented 5 years ago

Executing a recipe with variables from CMIP6 Omon table (e.g., chl) generates an error in fix_metadata of model dataset ..

Here below the error message

2019-03-18 17:14:48,358 UTC [17891] ERROR   Failed to run fix_metadata([<iris 'Cube' of mass_concentration_of_phytoplankton_expressed_as_chlorophyll_in_sea_water / (kg m-3) (time: 180; Vertical T levels: 75; -- : 294; -- : 362)>], {'project': 'CMIP6', 'dataset': 'CNRM-ESM2-1', 'short_name': 'chl', 'cmor_table': 'CMIP6', 'mip': 'Omon', 'frequency': ''})
2019-03-18 17:14:48,991 UTC [17891] ERROR   Program terminated abnormally, see stack trace below for more information
Traceback (most recent call last):
  File "/users/home/ans033/GIT/ESMValTool/esmvaltool/_main.py", line 228, in run
    conf = main(args)
  File "/users/home/ans033/GIT/ESMValTool/esmvaltool/_main.py", line 156, in main
    process_recipe(recipe_file=recipe, config_user=cfg)
  File "/users/home/ans033/GIT/ESMValTool/esmvaltool/_main.py", line 206, in process_recipe
    recipe.run()
  File "/users/home/ans033/GIT/ESMValTool/esmvaltool/_recipe.py", line 1050, in run
    self.tasks, max_parallel_tasks=self._cfg['max_parallel_tasks'])
  File "/users/home/ans033/GIT/ESMValTool/esmvaltool/_task.py", line 581, in run_tasks
    _run_tasks_sequential(tasks)
  File "/users/home/ans033/GIT/ESMValTool/esmvaltool/_task.py", line 592, in _run_tasks_sequential
    task.run()
  File "/users/home/ans033/GIT/ESMValTool/esmvaltool/_task.py", line 223, in run
    input_files.extend(task.run())
  File "/users/home/ans033/GIT/ESMValTool/esmvaltool/_task.py", line 226, in run
    self.output_files = self._run(input_files)
  File "/users/home/ans033/GIT/ESMValTool/esmvaltool/preprocessor/__init__.py", line 392, in _run
    product.apply(step, self.debug)
  File "/users/home/ans033/GIT/ESMValTool/esmvaltool/preprocessor/__init__.py", line 259, in apply
    self.cubes = preprocess(self.cubes, step, **self.settings[step])
  File "/users/home/ans033/GIT/ESMValTool/esmvaltool/preprocessor/__init__.py", line 201, in preprocess
    result.append(_run_preproc_function(function, items, settings))
  File "/users/home/ans033/GIT/ESMValTool/esmvaltool/preprocessor/__init__.py", line 187, in _run_preproc_function
    return function(items, **kwargs)
  File "/users/home/ans033/GIT/ESMValTool/esmvaltool/cmor/fix.py", line 116, in fix_metadata
    checker(cube).check_metadata()
  File "/users/home/ans033/GIT/ESMValTool/esmvaltool/cmor/check.py", line 104, in check_metadata
    self.report_errors()
  File "/users/home/ans033/GIT/ESMValTool/esmvaltool/cmor/check.py", line 121, in report_errors
    raise CMORCheckError(msg)
esmvaltool.cmor.check.CMORCheckError: There were errors in variable chl:
time: Frequency  not supported by checker
in cube:

Note that in fix_metadata the frequency field is empty.

For some reason, the code is not picking up the frequency field from the variable of the CMIP6 table, but if I hack the CMIP6_Omon.json table by adding the frequency in the file header (as it was before the last CMIP6 tables update) the code goes through the step.

As a first guess, something maybe wrong in getting frequency from variables for CMIP6 table here around https://github.com/ESMValGroup/ESMValTool/blob/f5eb5a3657325cc5d7c153a13856114c1ff29be0/esmvaltool/cmor/table.py#L104-L116

tomaslovato commented 5 years ago

The problem here is the assumption that table.frequency is defined also in the header section of CMIP6 tables, but it was changed in favor of having frequency defined for each variable.

The solution is to modify ESMValTool/esmvaltool/cmor/table.py as follow

@@ -102,15 +102,16 @@ class CMIP6Info(object):
             self.tables[table.name] = table

             generic_levels = header['generic_levels'].split()
-            table.frequency = header.get('frequency', '')
             table.realm = header.get('realm', '')

             for var_name, var_data in raw_data['variable_entry'].items():
                 var = VariableInfo('CMIP6', var_name)
                 if 'frequency' in var_data:
                     var.frequency = var_data['frequency']
+                    table.frequency = var.frequency
                 else:
-                    var.frequency = table.frequency
+                    raise ValueError('missing frequency for var ' +
+                        var_name + ' in ' + header['table_id'])
                 var.read_json(var_data)
                 self._assign_dimensions(var, generic_levels)
                 table[var_name] = var

I tested it and it only requires the removal of ESMValTool/esmvaltool/cmor/tables/cmip6/Tables/CMIP6_CFsubhrOff.json that is obsolete and no more included in CMOR tables (I missed that in the previous update PR)

I think it should be ok coded in this way :

@valeriupredoi @mattiarighi Any suggestion on how to implement the bugfix?

mattiarighi commented 5 years ago

This seems to affect obs4mips data as well. I get exactly the same error with the GPCP-SG and CERES-EBAF datasets.

valeriupredoi commented 5 years ago

for chl in CMIP6_Omon.json

        "chl": {
            "frequency": "mon",

we can't hack CMOR tables, if they are obsolete then we update those via git rather than hacking them manually. Can you provide a minimal recipe that I can use to reproduce this issue pls? :beer:

valeriupredoi commented 5 years ago

ah no need to worry about a recipe example, it fails straight away for tas as well - question is - has the current implementation worked atall__ for CMIP6 data?

valeriupredoi commented 5 years ago

it works if you add it in the recipe: frequency: mon, so worst case scenario we add it in recipe than we change the CMOR tables, but am looking at a more respectable change at the mo :grin:

tomaslovato commented 5 years ago

for chl in CMIP6_Omon.json

        "chl": {
            "frequency": "mon",

we can't hack CMOR tables, if they are obsolete then we update those via git rather than hacking them manually. Can you provide a minimal recipe that I can use to reproduce this issue pls? 🍺

@valeriupredoi My bad!! I forgot to say that with the correction proposed there is no need to hack cmor tables.

schlunma commented 5 years ago

ah no need to worry about a recipe example, it fails straight away for tas as well - question is - has the current implementation worked at__all for CMIP6 data?

Yes, ~ one week ago I successfully ran this:

https://github.com/ESMValGroup/ESMValTool/blob/864260c8e27f15d8a34b08be61372e8a3db32432/esmvaltool/recipes/recipe_ecs.yml#L77-L95

valeriupredoi commented 5 years ago

OK here's the bugger: in table.py: table.frequency = header.get('frequency', '') -- problem is in the latest CMIP6 tables the Header member does not contain the frequency anymore but it is specified at variable levels; also the same variable shows up in different modeling realms and the realm is treated as a single concatenated string; I changed it a bit like this:

    def _load_table(self, json_file):
        with open(json_file) as inf:
            raw_data = json.loads(inf.read())
            if not self._is_table(raw_data):
                return
            table = TableInfo()
            header = raw_data['Header']
            table.name = header['table_id'][6:].split('_')[-1]
            self.tables[table.name] = table
            generic_levels = header['generic_levels'].split()
            table.realm = header.get('realm', '').split()

            for var_name, var_data in raw_data['variable_entry'].items():
                for realm in table.realm:
                    if var_data['modeling_realm'] == realm:
                        var = VariableInfo('CMIP6', var_name)
                        if 'frequency' in var_data:
                            frequency = var_data['frequency']
                        else:
                            frequency = header.get('frequency', '')
                        var.read_json(var_data)
                        var.frequency = frequency
                        var.modeling_realm = realm
                        self._assign_dimensions(var, generic_levels)
                        table[var_name] = var

and I checked that the table instance returned does indeed contain a valid frequency, but I still get the checker fail, the problem should be spread to the checker as well....I'll look more into it tomorrow

tomaslovato commented 5 years ago

@valeriupredoi Why not to try to simply force assignment of frequency to the table as I did in the solution proposed above. Here below a simple addition to your code ...

                    if var_data['modeling_realm'] == realm:
                        var = VariableInfo('CMIP6', var_name)
                        if 'frequency' in var_data:
                            frequency = var_data['frequency']
                        else:
                            frequency = header.get('frequency', '')
                        var.read_json(var_data)
                        var.frequency = frequency
                        var.modeling_realm = realm
+                       table.frequency = frequency

As part of previous consideration, this assignment should be kept to avoid compatibility issue with CMIP5 tables.

The obsolete table I was referring (ESMValTool/esmvaltool/cmor/tables/cmip6/Tables/CMIP6_CFsubhrOff.json) was removed from the CMOR repository and it is a leftover from PR #937 that need to be removed as well.

valeriupredoi commented 5 years ago

hey @tomaslovato and @mattiarighi have a look at this one https://github.com/ESMValGroup/ESMValTool/pull/985 and test please, when you gots time, I will fix the test as soon as I hear you guys had success testing! @tomaslovato cheers for the suggestion - have a look at the code changes in the PR pls, it is pretty much just as you suggest, only that the table gets forced a frequency attribute depending on var and mip right before the table is passed over to the checks

tomaslovato commented 5 years ago

hey @tomaslovato and @mattiarighi have a look at this one #985 and test please, when you gots time, I will fix the test as soon as I hear you guys had success testing! @tomaslovato cheers for the suggestion - have a look at the code changes in the PR pls, it is pretty much just as you suggest, only that the table gets forced a frequency attribute depending on var and mip right before the table is passed over to the checks

tomaslovato commented 5 years ago

:top: I'll do it !!

mattiarighi commented 5 years ago

But why is this affecting obs4mips as well? We didn't change those tables, did we?

valeriupredoi commented 5 years ago

Good question - it's the checker that is calling CMIPXInfo (there is no function dedicated to obs4mips info from CMOR Table) and the problem was there - in fact, I am not even 100% sure why we even had the problem after all, for any model, since the syntax in CMIP6Info() was correct and accounting for the two cases that frequency can be in the var dictionary or the header of the CMOR Table. Forcing the frequency is a kludge, we'll have to figure it out why was missing in the first place (well, there were all sorts of problems with table.py, but even after fixing them that bloody freq was on and off)

Dr Valeriu Predoi. Computational scientist NCAS-CMS University of Reading Department of Meteorology Reading RG6 6BB United Kingdom

On Wed, 20 Mar 2019, 07:38 Mattia Righi, notifications@github.com wrote:

But why is this affecting obs4mips as well? We didn't change those tables, did we?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ESMValGroup/ESMValTool/issues/981#issuecomment-474719899, or mute the thread https://github.com/notifications/unsubscribe-auth/AbpCo6bMFhiLE2ax-25fmTeIwRq3xO3Wks5vYeWKgaJpZM4b6V3x .

valeriupredoi commented 5 years ago

dereferencing ESMValTool closed PR #985 and referencing #158

bouweandela commented 5 years ago

@tomaslovato It looks like this issue may have been solved. Can you confirm this?

tomaslovato commented 5 years ago

@bouweandela I tested again a recipe with chl data from CMIP6 models and it worked fine.