NCAR / esmlab

Earth System Model Lab (esmlab). ⚠️⚠️ ESMLab functionality has been moved into <https://github.com/NCAR/geocat-comp>. ⚠️⚠️
https://esmlab.readthedocs.io
Apache License 2.0
24 stars 8 forks source link

esmlab.anomaly generating error, unrealistically large time values #162

Open klindsay28 opened 4 years ago

klindsay28 commented 4 years ago

When I run the commands in the following code snippet, I get a ValueError exception. The Traceback is below the code snippet. The values listed in the cftime.DatetimeNoLeap call in the Traceback are fairly suspicious, as if something were uninitialized.

The call to esmlab.anomaly works fine if I comment out the 2nd fname setting, and use a timeseries from CESM2's atm component. I don't see a smoking gun difference between the file that triggers the exception and the file that doesn't.

Code Snippet

> import xarray as xr import esmlab dir = '/glade/work/klindsay/analysis/CESM2_coup_carb_cycle_JAMES/tseries' fname = 'SFCO2_OCN_atm_piControl_00_mon.nc' fname = 'FG_CO2_ocn_piControl_00_mon.nc' path = '/'.join([dir, fname]) ds = xr.open_dataset(path) ds_anom = esmlab.anomaly(ds, clim_freq='mon')

Traceback

> /glade/work/klindsay/analysis/CESM2_coup_carb_cycle_JAMES/esmlab_dev/esmlab/core.py:291: RuntimeWarning: invalid value encountered in greater calendar=self.time_attrs['calendar'], Traceback (most recent call last): File "foo5.py", line 8, in ds_anom = esmlab.anomaly(ds, clim_freq='mon') File "/glade/work/klindsay/analysis/CESM2_coup_carb_cycle_JAMES/esmlab_dev/esmlab/core.py", line 717, in anomaly slice_mon_clim_time=slice_mon_clim_time File "/glade/work/klindsay/miniconda3/envs/CESM2_coup_carb_cycle_JAMES_tst/lib/python3.7/contextlib.py", line 74, in inner return func(*args, **kwds) File "/glade/work/klindsay/analysis/CESM2_coup_carb_cycle_JAMES/esmlab_dev/esmlab/core.py", line 502, in compute_mon_anomaly return self.restore_dataset(computed_dset, attrs=attrs) File "/glade/work/klindsay/analysis/CESM2_coup_carb_cycle_JAMES/esmlab_dev/esmlab/core.py", line 291, in restore_dataset calendar=self.time_attrs['calendar'], File "cftime/_cftime.pyx", line 320, in cftime._cftime.num2date File "cftime/_cftime.pyx", line 869, in cftime._cftime.utime.num2date File "cftime/_cftime.pyx", line 564, in cftime._cftime.DateFromJulianDay File "cftime/_cftime.pyx", line 1378, in cftime._cftime.DatetimeNoLeap.__init__ File "cftime/_cftime.pyx", line 1637, in cftime._cftime.assert_valid_date ValueError: invalid second provided in cftime.DatetimeNoLeap(-5883517, 2, 27, 0, 0, -2147483648, -2147483648, 5, 58)

Output of esmlab.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 21:52:21) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-693.21.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 esmlab: 2019.4.27.post43 xarray: 0.14.0 pandas: 0.25.2 numpy: 1.17.3 scipy: 1.3.1 xesmf: 0.2.1 cftime: 1.0.3.4 dask: 2.6.0 distributed: 2.6.0 setuptools: 41.4.0 pip: 19.3.1 conda: None pytest: None IPython: 7.9.0 sphinx: None
mnlevy1981 commented 4 years ago

Looking at the two files, SFCO2_OCN_atm_piControl_00_mon.nc has time units of days since 0001-01-01 and FG_CO2_ocn_piControl_00_mon.nc has time units of days since 0000-01-01. I was under the impression that cftime didn't like having a year 0, but this appears to be a red herring as

import xarray as xr
import esmlab
dir = '/glade/work/klindsay/analysis/CESM2_coup_carb_cycle_JAMES/tseries'
# fname = 'SFCO2_OCN_atm_piControl_00_mon.nc'
fname = 'FG_CO2_ocn_piControl_00_mon.nc'
path = '/'.join([dir, fname])
ds = xr.open_dataset(path, decode_times=False)
ds.time.values = ds.time.values - 365
ds.time.attrs['units'] = "days since 0001-01-01"
ds = xr.decode_cf(ds)
ds_anom = esmlab.anomaly(ds, clim_freq='mon')

yields the same traceback:

/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/esmlab/core.py:291: RuntimeWarning: invalid value encountered in greater
  calendar=self.time_attrs['calendar'],
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/esmlab/core.py", line 730, in anomaly
    slice_mon_clim_time=slice_mon_clim_time
  File "/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/contextlib.py", line 74, in inner
    return func(*args, **kwds)
  File "/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/esmlab/core.py", line 515, in compute_mon_anomaly
    return self.restore_dataset(computed_dset, attrs=attrs)
  File "/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/esmlab/core.py", line 291, in restore_dataset
    calendar=self.time_attrs['calendar'],
  File "cftime/_cftime.pyx", line 320, in cftime._cftime.num2date
  File "cftime/_cftime.pyx", line 869, in cftime._cftime.utime.num2date
  File "cftime/_cftime.pyx", line 564, in cftime._cftime.DateFromJulianDay
  File "cftime/_cftime.pyx", line 1378, in cftime._cftime.DatetimeNoLeap.__init__
  File "cftime/_cftime.pyx", line 1637, in cftime._cftime.assert_valid_date
ValueError: invalid second provided in cftime.DatetimeNoLeap(-5883517, 2, 27, 0, 0, -2147483648, -2147483648, 5, 58)

What's odd is that I can't find the index of ds.time corresponding to cftime.DatetimeNoLeap(-5883517, 2, 27, 0, 0, -2147483648, -2147483648, 5, 58):

for i in range(14400):
    sec = ds.time.values[i].second
    if sec != 0:
            print(f'{i}: {ds.time.values[i]}')

outputs

0: 0001-01-16 12:59:59

So I guess an internal esmlab computation is creating a bad time axis?

klindsay28 commented 4 years ago

If I change fname to fname = 'TOTECOSYSC_lnd_esm-piControl_00_mon.nc', I see the same error, and time:units in this file is days since 0001-01-01.

So, @mnlevy1981 , I also think there is problem in esmlab.

The -2147483648 values lead me to suspect that something isn't getting initialized. I am not able to follow the esmlab code, so I don't know where the problem resides.

mnlevy1981 commented 4 years ago

I think @andersy005 is going to need to look into this one, but I just want to point out that your block of code runs fine if you do not decode time when opening the dataset:

import xarray as xr
import esmlab
dir = '/glade/work/klindsay/analysis/CESM2_coup_carb_cycle_JAMES/tseries'
fname = 'SFCO2_OCN_atm_piControl_00_mon.nc'
fname = 'FG_CO2_ocn_piControl_00_mon.nc'
path = '/'.join([dir, fname])
ds = xr.open_dataset(path, decode_times=False)
ds_anom = esmlab.anomaly(ds, clim_freq='mon')

That might be a useful workaround until this issue can be resolved.

klindsay28 commented 4 years ago

Thanks for the workaround @mnlevy1981 Another is to replace the problematic time coordinate with a non-problematic time coordinate. This assumes you have a non-problematic time coordinate, which fortunately I appear to have in my post-processed atm files. In particular, the following runs without error:

import xarray as xr
import esmlab

def repl_coord(coordname, ds1, ds2, deep=False):
    """
    Return copy of d2 with coordinate coordname replaced, using coordname from ds1.
    The returned Dataset is otherwise a copy of ds2.
    The copy is deep or not depending on the argument deep.
    """
    if 'bounds' in ds2[coordname].attrs:
        tb_name = ds2[coordname].attrs['bounds']
        ds_out = ds2.drop(tb_name).copy(deep)
    else:
        ds_out = ds2.copy(deep)
    ds_out[coordname] = ds1[coordname]
    if 'bounds' in ds1[coordname].attrs:
        tb_name = ds1[coordname].attrs['bounds']
        ds_out[tb_name] = ds1[tb_name]
    return ds_out

dir = '/glade/work/klindsay/analysis/CESM2_coup_carb_cycle_JAMES/tseries'

fname = 'SFCO2_OCN_atm_piControl_00_mon.nc'
path = '/'.join([dir, fname])
ds1 = xr.open_dataset(path)

fname = 'TOTECOSYSC_lnd_piControl_00_mon.nc'
fname = 'FG_CO2_ocn_piControl_00_mon.nc'
path = '/'.join([dir, fname])
ds2 = xr.open_dataset(path)
ds2 = repl_coord('time', ds1, xr.open_dataset(path))
ds2_anom = esmlab.anomaly(ds2, clim_freq='mon')
klindsay28 commented 4 years ago

I got bit by this again, in a workflow where there wasn't a convenient good time coordinate to replace the bad one with. It's also not convenient to add decode_times=False to the open_dataset/open_mfdataset calls.

I found that modifying the time values by a small amount, less than 1.0e-10, can avoid the problem when it arises. So I put a kludge in my wrapper to esmlab.anomaly to change the time values by small amounts if they are close to known problematic values. It's very kludgy.

Any chance that this will get looked at in the near future?