ESMValGroup / ESMValTool

ESMValTool: A community diagnostic and performance metrics tool for routine evaluation of Earth system models in CMIP
https://www.esmvaltool.org
Apache License 2.0
218 stars 128 forks source link

seasonal_mean fails (returns None) #664

Closed evertrol closed 5 years ago

evertrol commented 6 years ago

When I use esmvaltool.preprocessor.seasonal_mean, None is returned, where I expected a seasonally-averaged cube.

I've provided a Python script and a recipe below as test case. The dataset used is for this test case pr_Amon_ACCESS1-3_historical_r1i1p1_185001-200512.nc Software versions involved (Conda installation, macOS 10.13.6, but also reproduced from a Docker container running Ubuntu 18.04.1 LTS):


The Python script below shows a workaround, by re-implementing seasonal_mean and comparing the time bounds to within a range of days; the built-in variant compares with a fixed amount of hours (2160 == 90 days), while my time bounds are in days (and range between 90 and 92). Note that in both variants, units are completely missing (the types of time.bound[0] and time.bound[1] are numpy.float64).

import esmvaltool.preprocessor as pp
import esmvaltool
from esmvaltool.cmor.table import read_cmor_tables
esmvaltool._config.CFG = esmvaltool._config.read_config_developer_file()
read_cmor_tables(esmvaltool._config.CFG)

fname = "pr_Amon_ACCESS1-3_historical_r1i1p1_185001-200512.nc"
cubes = pp.load_cubes([fname], fname, None)
cube = cubes[0]
cube = pp.fix_metadata(cube, "pr", "CMIP5", "ACCESS1-3", mip="Amon", cmor_table="CMIP5")
pp.cmor_check_metadata(cube, "CMIP5", "Amon", "pr")
cube = pp.fix_data(cube, "pr", "CMIP5", "ACCESS1-3", cmor_table="CMIP5", mip="Amon")

assert cube.coord('time').shape == (1872,)

def seasonal_mean(cube):
    import iris
    iris.coord_categorisation.add_season(cube, 'time', name='clim_season')
    iris.coord_categorisation.add_season_year(
        cube, 'time', name='season_year')
    annual_seasonal_mean = cube.aggregated_by(['clim_season', 'season_year'],
                                              iris.analysis.MEAN)

    def spans_three_months(time):
        """Check for three months"""
        # 3-month periods are 90, 91 or 92 days
        return 90 <= (time.bound[1] - time.bound[0]) <= 92
    three_months_bound = iris.Constraint(time=spans_three_months)
    return annual_seasonal_mean.extract(three_months_bound)

# Need a copy, because pp.seasonal_mean will add new coordinates (clim_season & season_year)
cube_copy = cube.copy()
cube = pp.seasonal_mean(cube)
assert cube is None

# Run our own seasonal_mean
cube = seasonal_mean(cube_copy)
# nearly a factor 3 smaller, since the start (Jan & Feb 1850) and end
# (Dec 2005) are not averaged: (1872-3)//3 == 623
assert cube.coord('time').shape == (623,)

Recipe test.yml:

datasets:
  - {dataset: ACCESS1-3, project: CMIP5, mip: Amon, exp: historical, ensemble: r1i1p1, start_year: 1850, end_year: 2005}

preprocessors:
  pp1:
    seasonal_mean:

diagnostics:
  diagnostic1:
    description: test case
    variables:
      pr:
        preprocessor: pp1
        field: T2Ms
    scripts:
      test:
        script: /path/to/test.py

test.py is only there to provide a required script: it doesn't really do anything:

#! /usr/bin/env python

from esmvaltool.diag_scripts.shared import run_diagnostic

def main(cfg):
    pass

if __name__ == '__main__':
    with run_diagnostic() as config:
        main(config)

The full debug log from running esmvaltool test.yml is provided in this gist, but it ends with the traceback below, which indicates that at some point, the data cube has indeed become None:

018-10-16 08:43:34,784 UTC [50059] ERROR   esmvaltool._main:209 Program terminated abnormally, see stack trace below for more information
Traceback (most recent call last):
  File "/usr/local/miniconda3/envs/esmval2/lib/python3.6/site-packages/esmvaltool/_main.py", line 204, in run
    conf = main(args)
  File "/usr/local/miniconda3/envs/esmval2/lib/python3.6/site-packages/esmvaltool/_main.py", line 133, in main
    process_recipe(recipe_file=recipe, config_user=cfg)
  File "/usr/local/miniconda3/envs/esmval2/lib/python3.6/site-packages/esmvaltool/_main.py", line 183, in process_recipe
    recipe.run()
  File "/usr/local/miniconda3/envs/esmval2/lib/python3.6/site-packages/esmvaltool/_recipe.py", line 988, in run
    self.tasks, max_parallel_tasks=self._cfg['max_parallel_tasks'])
  File "/usr/local/miniconda3/envs/esmval2/lib/python3.6/site-packages/esmvaltool/_task.py", line 473, in run_tasks
    _run_tasks_sequential(tasks)
  File "/usr/local/miniconda3/envs/esmval2/lib/python3.6/site-packages/esmvaltool/_task.py", line 484, in _run_tasks_sequential
    task.run()
  File "/usr/local/miniconda3/envs/esmval2/lib/python3.6/site-packages/esmvaltool/_task.py", line 184, in run
    input_files.extend(task.run())
  File "/usr/local/miniconda3/envs/esmval2/lib/python3.6/site-packages/esmvaltool/_task.py", line 185, in run
    self.output_files = self._run(input_files)
  File "/usr/local/miniconda3/envs/esmval2/lib/python3.6/site-packages/esmvaltool/preprocessor/__init__.py", line 281, in _run
    input_files, self.settings, self.order, debug=self.debug)
  File "/usr/local/miniconda3/envs/esmval2/lib/python3.6/site-packages/esmvaltool/preprocessor/__init__.py", line 206, in preprocess_multi_model
    debug)
  File "/usr/local/miniconda3/envs/esmval2/lib/python3.6/site-packages/esmvaltool/preprocessor/__init__.py", line 243, in preprocess
    result.append(function(item, **args))
  File "/usr/local/miniconda3/envs/esmval2/lib/python3.6/site-packages/esmvaltool/cmor/check.py", line 606, in cmor_check_data
    checker(cube).check_data()
  File "/usr/local/miniconda3/envs/esmval2/lib/python3.6/site-packages/esmvaltool/cmor/check.py", line 166, in check_data
    if str(self._cube.units) != units:
AttributeError: 'NoneType' object has no attribute 'units'
RCHG commented 6 years ago

Maybe the problem is on function spans_three_months(time) inside of seasonal_mean, with:

    dt_3months = datetime.timedelta(hours=24*3*29)
    def spans_three_months(time):
        """Check for three months"""
        return (time.bound[1] - time.bound[0]) > dt_3months

it seems to return a cube.

I have added a test on a recipe here that at least is able to save a netcdf.

Note that I have Iris 2.1.

evertrol commented 6 years ago

Note that I have Iris 2.1.

That certainly makes some difference. The Conda recipe I installed defines 1.13 for compatibility reasons. Ignoring those, I installed Iris 2.1 as well.

That doesn't make things go away, depending on how one defines the constraint function:

    return 90 <= time.bound[1] - time.bound[0] <= 92

now properly errors with

TypeError: '<=' not supported between instances of 'int' and 'datetime.timedelta'

Fixing it:

    return datetime.timedelta(days=90) <= (time.bound[1] - time.bound[0]) <= datetime.timedelta(days=92)

produces the previous result from my test case.

What passes, however, is

    return time.bound[1] - time.bound[0] == 92

albeit that the result is of course not correct. But it doesn't throw an error, and probably just shows that datetime.timdelta is overridden for __eq__(*).

This means that the hardcoded time.bound[1] - time.bound[0] == 2160 is always False, but doesn't error, and the returned value ends up being None.

(*) in fact, the source code for timedelta shows:

def __eq__(self, other):
        if isinstance(other, timedelta):
            return self._cmp(other) == 0
        else:
            return False
RCHG commented 6 years ago

If I understood well, with the constrain I provided it seems to me that the netcdf saved is seasonally saved correctly, let's say, it perform a mean every 3 months.

Basically the work is done by thecoord_categorizationand add_season methods, together with the aggregated_by. The last 'time-extraction' just is designed to ensure that every time interval is larger than a minimum number of hours (in my case 32429), in such way that we eliminate seasons with only 2 months from our final data-set (cube). It seems to me that the version I gave you has a correct approach.

For the version Iris 1.13 the documentation explained that also it is the case.

RCHG commented 6 years ago

I think that this issue is closed as it seems that this solves the issue?

Maybe the problem is on function spans_three_months(time) inside of seasonal_mean, with:

    dt_3months = datetime.timedelta(hours=24*3*29)
    def spans_three_months(time):
        """Check for three months"""
        return (time.bound[1] - time.bound[0]) > dt_3months

it seems to return a cube.

I have added a test on a recipe here that at least is able to save a netcdf.

Note that I have Iris 2.1.

mattiarighi commented 6 years ago

If you have a fix that works with Iris 1.13, please open a pull request.

RCHG commented 5 years ago

According to the Iris documentation, yes, the version here solves the problem for Iris 2 and Iris 1.13

` dt_3months = datetime.timedelta(hours=24329)

def spans_three_months(time):
    """Check for three months"""

return (time.bound[1] - time.bound[0]) > dt_3months`

solves the problem.

mattiarighi commented 5 years ago

@ledm would you be OK with this fix? If so, @RCHG can you open a pull request with it so that we can test?

Thank you. :+1:

ledm commented 5 years ago

Surely the number of hours in three months depends on the calendar used? This may work for a gregorian calendar, but it would be inaccurate for the 360_day calendar.

Similarly, what if a month is missing? you could form a list of three months January, March, April, the difference in the time bounds are greater than hours=24*3*29, but it is not three months.

bouweandela commented 5 years ago

@ledm Have a look at the entire function, I believe this last constraint is only supposed the filter out the winter season that misses one or two months at the start and end of the data, therefore I think a constraint that is larger than 2 months in any calendar should be sufficient.

https://github.com/ESMValGroup/ESMValTool/blob/43598ee48a327fc3ef2a6e53768324d02ad61863/esmvaltool/preprocessor/_time_area.py#L117-L147

RCHG commented 5 years ago

@ledm

If I understood well, with the constrain I provided it seems to me that the netcdf saved is seasonally saved correctly, let's say, it perform a mean every 3 months.

Basically the work is done by the coord_categorizationand add_season methods, together with the aggregated_by. The last 'time-extraction' just is designed to ensure that every time interval is larger than a minimum number of hours (in my case 3_24_29), in such way that we eliminate seasons with only 2 months from our final data-set (cube). It seems to me that the version I gave you has a correct approach.

For the version Iris 1.13 the documentation explained that also it is the case.

So the aggregation by seasons is done by coord_categorizationand add_season methods, they would group the time by season (by adding an additional auxiliary coordinate!). The function I described here just remove those season groups that are not fulfilling the condition hours>=24329 (which could be adjusted to a less restricted case, if we want). If we have a dataset [December, January, March, April, May...] it would group WINTER=[December, January] and SPRING=[March, April, May] . In a second step, we could assess with the function whether the true time interval inside each season group covers or not a minimum number of hours. With the condition hours>=24329 it would remove from the final cube the interval [December, January] as it is not fulfilling the condition. I understood that according to the Iris documentation.

valeriupredoi commented 5 years ago

just raced through this issue: careful since iris 2+ uses time cells (datetime objects) for both time points and bounds - cube.coord('time').units.date2num(timepoint) converts a point of bound element to numeric. Incidentally has this been solved in the meantime? So we can close :grin:

ruthlorenz commented 5 years ago

I still run into this issue, so I guess it has not been fixed yet? I also do not understand if it is supposed to return seasonal means per year or averaged over all the years? if its the latter is there any change we could implement the former? I would need seasonal means for every year but ideally all seasons (basically like extract_season and then annual_average but for multiple seasons)

RCHG commented 5 years ago

As far as I know it is solved in this thread, althought I don't know it if is already implemented on the master version. On my working repository it works with Iris 1.13 and Iris 2 as well with the remarks done before.

bouweandela commented 5 years ago

This is not solved, a solution was suggested in this issue, but no-one has made a pull request.

valeriupredoi commented 5 years ago

I can have a look at this tomorrow, since I've just gone through a whole bunch of time tools anyway; just to confirm, @ruthlorenz you have your environment up-to-date and using iris=2.2.0 and cf-units=2.0.2 right? Can you pls tell me for what dataset/var it is failing and what is the actual Trace message? Cheers :beer:

ruthlorenz commented 5 years ago

actually no, my environment is not up to date, I did not dare to update it after some of my colleagues had trouble, but will do as soon as I am finished with the other stuff I started. I let you know!

valeriupredoi commented 5 years ago

env creation is now safe as of yesterday with the env files on version2_development, not a worry

ruthlorenz commented 5 years ago

well with the latest environment I get:

Traceback (most recent call last): File "/net/tropo/climphys/rlorenz/conda/envs/esmvaltool/lib/python3.6/site-packages/pkg_resources/init.py", line 581, in _build_master ws.require(requires) File "/net/tropo/climphys/rlorenz/conda/envs/esmvaltool/lib/python3.6/site-packages/pkg_resources/init.py", line 898, in require needed = self.resolve(parse_requirements(requirements)) File "/net/tropo/climphys/rlorenz/conda/envs/esmvaltool/lib/python3.6/site-packages/pkg_resources/init.py", line 789, in resolve raise VersionConflict(dist, req).with_context(dependent_req) pkg_resources.ContextualVersionConflict: (matplotlib 3.0.2 (/net/tropo/climphys1/rlorenz/conda/envs/esmvaltool/lib/python3.6/site-packages), Requirement.parse('matplotlib<3'), {'ESMValTool'})

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/net/tropo/climphys/rlorenz/conda/envs/esmvaltool/bin/esmvaltool", line 6, in from pkg_resources import load_entry_point File "/net/tropo/climphys/rlorenz/conda/envs/esmvaltool/lib/python3.6/site-packages/pkg_resources/init.py", line 3126, in @_call_aside File "/net/tropo/climphys/rlorenz/conda/envs/esmvaltool/lib/python3.6/site-packages/pkg_resources/init.py", line 3110, in _call_aside f(*args, **kwargs) File "/net/tropo/climphys/rlorenz/conda/envs/esmvaltool/lib/python3.6/site-packages/pkg_resources/init.py", line 3139, in _initialize_master_working_set working_set = WorkingSet._build_master() File "/net/tropo/climphys/rlorenz/conda/envs/esmvaltool/lib/python3.6/site-packages/pkg_resources/init.py", line 583, in _build_master return cls._build_from_requirements(requires) File "/net/tropo/climphys/rlorenz/conda/envs/esmvaltool/lib/python3.6/site-packages/pkg_resources/init.py", line 596, in _build_from_requirements dists = ws.resolve(reqs, Environment()) File "/net/tropo/climphys/rlorenz/conda/envs/esmvaltool/lib/python3.6/site-packages/pkg_resources/init.py", line 784, in resolve raise DistributionNotFound(req, requirers) pkg_resources.DistributionNotFound: The 'cycler>=0.10' distribution was not found and is required by matplotlib

??? Did I mess up something?

valeriupredoi commented 5 years ago

it's best if you created the environment from scratch ie conda env create -n esmvaltool_stable -f environment.yml - usually conda env update should work as well but better be safe than sorry

ruthlorenz commented 5 years ago

never mind, forgot to reinstall... conda env update seems to have worked.

But:

2019-03-15 15:13:36,127 UTC [42238] ERROR Program terminated abnormally, see stack trace below for more information Traceback (most recent call last): File "/home/rlorenz/ESMValTool/ESMValTool-private/esmvaltool/_main.py", line 228, in run conf = main(args) File "/home/rlorenz/ESMValTool/ESMValTool-private/esmvaltool/_main.py", line 156, in main process_recipe(recipe_file=recipe, config_user=cfg) File "/home/rlorenz/ESMValTool/ESMValTool-private/esmvaltool/_main.py", line 206, in process_recipe recipe.run() File "/home/rlorenz/ESMValTool/ESMValTool-private/esmvaltool/_recipe.py", line 1050, in run self.tasks, max_parallel_tasks=self._cfg['max_parallel_tasks']) File "/home/rlorenz/ESMValTool/ESMValTool-private/esmvaltool/_task.py", line 581, in run_tasks _run_tasks_sequential(tasks) File "/home/rlorenz/ESMValTool/ESMValTool-private/esmvaltool/_task.py", line 592, in _run_tasks_sequential task.run() File "/home/rlorenz/ESMValTool/ESMValTool-private/esmvaltool/_task.py", line 223, in run input_files.extend(task.run()) File "/home/rlorenz/ESMValTool/ESMValTool-private/esmvaltool/_task.py", line 226, in run self.output_files = self._run(input_files) File "/home/rlorenz/ESMValTool/ESMValTool-private/esmvaltool/preprocessor/init.py", line 392, in _run product.apply(step, self.debug) File "/home/rlorenz/ESMValTool/ESMValTool-private/esmvaltool/preprocessor/init.py", line 259, in apply self.cubes = preprocess(self.cubes, step, **self.settings[step]) File "/home/rlorenz/ESMValTool/ESMValTool-private/esmvaltool/preprocessor/init.py", line 212, in preprocess items.extend(item) TypeError: 'NoneType' object is not iterable

Preprocessor is: preproc_map_seas: regrid: target_grid: 1x1 scheme: linear seasonal_mean:

diagnostics: IAV_calc_hurs: description: Calculate mean relative humidity interannual variability for stippling significance themes:

valeriupredoi commented 5 years ago

sorry @ruthlorenz I cant run anything at the moment - there is an issue with the config-references and with the cmor/table that need attention, will be back here once those get fixed

valeriupredoi commented 5 years ago

apologies for the tardy fix @ruthlorenz - have a look at https://github.com/ESMValGroup/ESMValTool/pull/980 and see if it does the trick for you (I tested in minimal mode)