Ouranosinc / xclim

Library of derived climate variables, ie climate indicators, based on xarray.
https://xclim.readthedocs.io/en/stable/
Apache License 2.0
333 stars 59 forks source link

Unexpected sdba.adjustment.ExtremeValues behaviour: fails when it encounters all nans at a location. #1982

Closed bweeding closed 4 weeks ago

bweeding commented 4 weeks ago

Generic Issue

I'm trying to apply sdba.adjustment.ExtremeValues to some precipitation data with coordinates time, lat, lon. The data is over land only, and areas of water are all nan. However, when I try this the compute fails. If I select locations (combinations of lat and lon) that are land only, the function works. However, as other xclim bias adjusting functions successfully ignore all nan locations, this behaviour seems out of character for the package.

If I slice the data so I'm only including locations with non-nan values, things work fine:

ex_ad = sdba.adjustment.ExtremeValues.train(ref=ref.isel(lat=slice(1,3),lon=0),hist=hist.isel(lat=slice(1,3),lon=0),q_thresh=0.9,cluster_thresh='10 mm/day')

hist_adj_ex = ex_ad.adjust(sim=hist.isel(lat=slice(1,3),lon=0),scen=hist_adj.isel(lat=slice(1,3),lon=0))

hist_adj_ex.compute()

image

However, if I include a location that contains all nans, rather than ignoring it, the code fails as shown below. Any help would be greatly appreciated.

ex_ad = sdba.adjustment.ExtremeValues.train(ref=ref.isel(lat=slice(0,3),lon=0),hist=hist.isel(lat=slice(0,3),lon=0),q_thresh=0.9,cluster_thresh='10 mm/day')

hist_adj_ex = ex_ad.adjust(sim=hist.isel(lat=slice(0,3),lon=0),scen=hist_adj.isel(lat=slice(0,3),lon=0))

hist_adj_ex.compute()
2024-10-31 14:31:05,216 - distributed.worker - ERROR - Compute Failed
Key:       ('block_extremes_train-995e2d42ef1cea530a5492b8df1f477b', 0, 0)
State:     executing
Function:  execute_task
args:      ((<function map_blocks.<locals>._wrapper at 0x7f505bb1b880>, <function map_blocks.<locals>._decorator.<locals>._map_blocks.<locals>._call_and_transpose_on_exit at 0x7f5055e7fe20>, [(<class 'xarray.core.dataset.Dataset'>, (<class 'dict'>, [['ref', (('time', 'lat'), array([[        nan, 15.06333065, 15.85973835],
       [        nan, 10.56827641, 11.57966232],
       [        nan, 15.07799339, 16.558424  ],
       ...,
       [        nan, 16.86088753, 19.47458076],
       [        nan,  9.26323223,  7.97560787],
       [        nan, 11.53819275, 10.99389839]]), {'units': 'mm d-1'})], ['hist', (('lat', 'time'), array([[         nan,          nan,          nan, ...,          nan,
                 nan,          nan],
       [ 97.38849811,   3.01736853,   0.39729742, ...,   2.29144176,
          5.66545282,   8.20756713],
       [131.27721995,   0.51255105,   0.980815  , ...,  12.71072365,
         30.73628722,  24.9307665 ]]), {'units': 'mm d-1'})], ['ref_params', ((), array(nan, dtype=flo
kwargs:    {}
Exception: "ValueError('extremes_train failed on block with coords : Coordinates:\\n  * lat      (lat) float64 24B -1.2 -1.1 -1.0\\n  * time     (time) object 105kB 1970-01-01 00:00:00 ... 2005-12-31 00:00:00.')"
Traceback: '  File "/opt/anaconda3/envs/xclim/lib/python3.12/site-packages/xarray/core/parallel.py", line 351, in _wrapper\n    result = func(*converted_args, **kwargs)\n             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File "/opt/anaconda3/envs/xclim/lib/python3.12/site-packages/xclim/sdba/base.py", line 691, in _call_and_transpose_on_exit\n    raise ValueError(\n'

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
File /opt/anaconda3/envs/xclim/lib/python3.12/site-packages/xclim/sdba/base.py:689, in _call_and_transpose_on_exit()
    688     _decode_cf_coords(dsblock)
--> 689     func_out = func(dsblock, **f_kwargs).transpose(*all_dims)
    690 except Exception as err:

File /opt/anaconda3/envs/xclim/lib/python3.12/site-packages/xclim/sdba/_adjustment.py:852, in extremes_train()
    830 """Train extremes for a given variable series.
    831 
    832 Parameters
   (...)
    850     The dataset containing the quantiles, the adjustment factors, and the threshold.
    851 """
--> 852 px_hist, af, thresh = xr.apply_ufunc(
    853     _extremes_train_1d,
    854     ds.ref,
    855     ds.hist,
    856     ds.ref_params or np.nan,
    857     input_core_dims=[("time",), ("time",), ()],
    858     output_core_dims=[("quantiles",), ("quantiles",), ()],
    859     vectorize=True,
    860     kwargs={
    861         "q_thresh": q_thresh,
    862         "cluster_thresh": cluster_thresh,
    863         "dist": dist,
    864         "N": len(quantiles),
    865     },
    866 )
    867 # Outputs of map_blocks must have dimensions.

File /opt/anaconda3/envs/xclim/lib/python3.12/site-packages/xarray/core/computation.py:1278, in apply_ufunc()
   1277 elif any(isinstance(a, DataArray) for a in args):
-> 1278     return apply_dataarray_vfunc(
   1279         variables_vfunc,
   1280         *args,
   1281         signature=signature,
   1282         join=join,
   1283         exclude_dims=exclude_dims,
   1284         keep_attrs=keep_attrs,
   1285     )
   1286 # feed Variables directly through apply_variable_ufunc

File /opt/anaconda3/envs/xclim/lib/python3.12/site-packages/xarray/core/computation.py:320, in apply_dataarray_vfunc()
    319 data_vars = [getattr(a, "variable", a) for a in args]
--> 320 result_var = func(*data_vars)
    322 out: tuple[DataArray, ...] | DataArray

File /opt/anaconda3/envs/xclim/lib/python3.12/site-packages/xarray/core/computation.py:831, in apply_variable_ufunc()
    827         func = _vectorize(
    828             func, signature, output_dtypes=output_dtypes, exclude_dims=exclude_dims
    829         )
--> 831 result_data = func(*input_data)
    833 if signature.num_outputs == 1:

File /opt/anaconda3/envs/xclim/lib/python3.12/site-packages/numpy/lib/function_base.py:2372, in __call__()
   2370     return self
-> 2372 return self._call_as_normal(*args, **kwargs)

File /opt/anaconda3/envs/xclim/lib/python3.12/site-packages/numpy/lib/function_base.py:2365, in _call_as_normal()
   2363     vargs.extend([kwargs[_n] for _n in names])
-> 2365 return self._vectorize_call(func=func, args=vargs)

File /opt/anaconda3/envs/xclim/lib/python3.12/site-packages/numpy/lib/function_base.py:2446, in _vectorize_call()
   2445 if self.signature is not None:
-> 2446     res = self._vectorize_call_with_signature(func, args)
   2447 elif not args:

File /opt/anaconda3/envs/xclim/lib/python3.12/site-packages/numpy/lib/function_base.py:2486, in _vectorize_call_with_signature()
   2485 for index in np.ndindex(*broadcast_shape):
-> 2486     results = func(*(arg[index] for arg in args))
   2488     n_results = len(results) if isinstance(results, tuple) else 1

File /opt/anaconda3/envs/xclim/lib/python3.12/site-packages/xclim/sdba/_adjustment.py:784, in _extremes_train_1d()
    782 # Find quantile q_thresh
    783 thresh = (
--> 784     np.quantile(ref[ref >= cluster_thresh], q_thresh)
    785     + np.quantile(hist[hist >= cluster_thresh], q_thresh)
    786 ) / 2
    788 # Fit genpareto on cluster maximums on ref (if needed) and hist.

File /opt/anaconda3/envs/xclim/lib/python3.12/site-packages/numpy/lib/function_base.py:4543, in quantile()
   4542     raise ValueError("Quantiles must be in the range [0, 1]")
-> 4543 return _quantile_unchecked(
   4544     a, q, axis, out, overwrite_input, method, keepdims)

File /opt/anaconda3/envs/xclim/lib/python3.12/site-packages/numpy/lib/function_base.py:4555, in _quantile_unchecked()
   4554 """Assumes that q is in [0, 1], and is an ndarray"""
-> 4555 return _ureduce(a,
   4556                 func=_quantile_ureduce_func,
   4557                 q=q,
   4558                 keepdims=keepdims,
   4559                 axis=axis,
   4560                 out=out,
   4561                 overwrite_input=overwrite_input,
   4562                 method=method)

File /opt/anaconda3/envs/xclim/lib/python3.12/site-packages/numpy/lib/function_base.py:3823, in _ureduce()
   3821             kwargs['out'] = out[(Ellipsis, ) + index_out]
-> 3823 r = func(a, **kwargs)
   3825 if out is not None:

File /opt/anaconda3/envs/xclim/lib/python3.12/site-packages/numpy/lib/function_base.py:4722, in _quantile_ureduce_func()
   4721         arr = a.copy()
-> 4722 result = _quantile(arr,
   4723                    quantiles=q,
   4724                    axis=axis,
   4725                    method=method,
   4726                    out=out)
   4727 return result

File /opt/anaconda3/envs/xclim/lib/python3.12/site-packages/numpy/lib/function_base.py:4831, in _quantile()
   4830 if supports_nans:
-> 4831     slices_having_nans = np.isnan(arr[-1, ...])
   4832 else:

IndexError: index -1 is out of bounds for axis 0 with size 0

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
Cell In[110], line 3
      1 ex_ad = sdba.adjustment.ExtremeValues.train(ref=ref.isel(lat=slice(0,3),lon=0),hist=hist.isel(lat=slice(0,3),lon=0),q_thresh=0.9,cluster_thresh='10 mm/day')
----> 3 ex_ad.ds.compute()
      5 hist_adj_ex = ex_ad.adjust(sim=hist.isel(lat=slice(0,3),lon=0),scen=hist_adj.isel(lat=slice(0,3),lon=0))
      8 do = hist_adj_ex.compute()

File /opt/anaconda3/envs/xclim/lib/python3.12/site-packages/xarray/core/dataset.py:1043, in Dataset.compute(self, **kwargs)
   1019 """Manually trigger loading and/or computation of this dataset's data
   1020 from disk or a remote source into memory and return a new dataset.
   1021 Unlike load, the original dataset is left unaltered.
   (...)
   1040 dask.compute
   1041 """
   1042 new = self.copy(deep=False)
-> 1043 return new.load(**kwargs)

File /opt/anaconda3/envs/xclim/lib/python3.12/site-packages/xarray/core/dataset.py:870, in Dataset.load(self, **kwargs)
    867 chunkmanager = get_chunked_array_type(*lazy_data.values())
    869 # evaluate all the chunked arrays simultaneously
--> 870 evaluated_data: tuple[np.ndarray[Any, Any], ...] = chunkmanager.compute(
    871     *lazy_data.values(), **kwargs
    872 )
    874 for k, data in zip(lazy_data, evaluated_data, strict=False):
    875     self.variables[k].data = data

File /opt/anaconda3/envs/xclim/lib/python3.12/site-packages/xarray/namedarray/daskmanager.py:86, in DaskManager.compute(self, *data, **kwargs)
     81 def compute(
     82     self, *data: Any, **kwargs: Any
     83 ) -> tuple[np.ndarray[Any, _DType_co], ...]:
     84     from dask.array import compute
---> 86     return compute(*data, **kwargs)

File /opt/anaconda3/envs/xclim/lib/python3.12/site-packages/dask/base.py:660, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
    657     postcomputes.append(x.__dask_postcompute__())
    659 with shorten_traceback():
--> 660     results = schedule(dsk, keys, **kwargs)
    662 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])

File /opt/anaconda3/envs/xclim/lib/python3.12/site-packages/xarray/core/parallel.py:351, in _wrapper()
    341 """
    342 Wrapper function that receives datasets in args; converts to dataarrays when necessary;
    343 passes these to the user function `func` and checks returned objects for expected shapes/sizes/etc.
    344 """
    346 converted_args = [
    347     dataset_to_dataarray(arg) if is_array else arg
    348     for is_array, arg in zip(arg_is_array, args, strict=True)
    349 ]
--> 351 result = func(*converted_args, **kwargs)
    353 merged_coordinates = merge(
    354     [arg.coords for arg in args if isinstance(arg, Dataset | DataArray)]
    355 ).coords
    357 # check all dims are present

File /opt/anaconda3/envs/xclim/lib/python3.12/site-packages/xclim/sdba/base.py:691, in _call_and_transpose_on_exit()
    689     func_out = func(dsblock, **f_kwargs).transpose(*all_dims)
    690 except Exception as err:
--> 691     raise ValueError(
    692         f"{func.__name__} failed on block with coords : {dsblock.coords}."
    693     ) from err
    694 return func_out

ValueError: extremes_train failed on block with coords : Coordinates:
  * lat      (lat) float64 24B -1.2 -1.1 -1.0
  * time     (time) object 105kB 1970-01-01 00:00:00 ... 2005-12-31 00:00:00.

extremes_issues.zip

Code of Conduct

coxipi commented 4 weeks ago

Hi! Thanks for raising this issue! I agree with you that all-nan slices are not treated appropriately as they as elsewhere in xclim. The culprit is this operation on 1d-array: np.quantile(ref[ref >= cluster_thresh], q_thresh), it should have a been using np.nanquantile. But even then, there are other operations after that don't make much sense with all-nan slices.

I will add a fast track that simply outputs nan's for training parameters whenever this situtation occurs, I'll keep you posted.

coxipi commented 4 weeks ago

I'll close the issue, it was fixed in #1983. The fixes will appear in the next xclim version. Thanks again for raising this issue!

Best, Éric