euroargodev / argopy

A python library for Argo data beginners and experts
https://argopy.readthedocs.io
European Union Public License 1.2
181 stars 41 forks source link

xarray version 0.16.1 breaking argopy #64

Closed gmaze closed 3 years ago

gmaze commented 3 years ago

It seems that since xarray moved to 0.16.1, argopy is broken !

MCVE Code Sample

import xarray  # Loading version 0.16.1
import argopy  # Loading master version
l = argopy.fetchers.ArgoDataFetcher(src='erddap', cache=0).region([-60, -55, 40., 45., 0., 10., '2007-08-01', '2007-09-01'])
# l = argopy.fetchers.ArgoDataFetcher(src='erddap', cache=0).profile(6902746, 34)
l.to_xarray()

the same code with an xarray version strictly lower than 0.16.1 will work (eg. 0.15.1, 0.16.0)

This will throw an error like those found in unit testing here before we locked xarray version to 0.15.1. This is the reason we currently have an "outdated" message for argopy requirements.

Full error stack --------------------------------------------------------------------------- ValueError Traceback (most recent call last) in 1 l = argopy.fetchers.ArgoDataFetcher(src='erddap', cache=0).region([-60, -55, 40., 45., 0., 10., '2007-08-01', '2007-09-01']) 2 # l = argopy.fetchers.ArgoDataFetcher(src='erddap', cache=0).profile(6902746, 34) ----> 3 l.to_xarray() ~/git/github/euroargodev/argopy/argopy/fetchers.py in to_xarray(self, **kwargs) 271 ",".join(self.Fetchers.keys())) 272 xds = self.fetcher.to_xarray(**kwargs) --> 273 xds = self.postproccessor(xds) 274 return xds 275 ~/git/github/euroargodev/argopy/argopy/fetchers.py in postprocessing(xds) 251 252 def postprocessing(xds): --> 253 xds = self.fetcher.filter_data_mode(xds) 254 xds = self.fetcher.filter_qc(xds) 255 xds = self.fetcher.filter_variables(xds, self._mode) ~/git/github/euroargodev/argopy/argopy/data_fetchers/erddap_data.py in filter_data_mode(self, ds, **kwargs) 467 468 def filter_data_mode(self, ds, **kwargs): --> 469 ds = ds.argo.filter_data_mode(errors="ignore", **kwargs) 470 if ds.argo._type == "point": 471 ds["N_POINTS"] = np.arange(0, len(ds["N_POINTS"])) ~/git/github/euroargodev/argopy/argopy/xarray.py in filter_data_mode(self, keep_error, errors) 308 309 # Create one dataset for each of the data_mode: --> 310 argo_r, argo_a, argo_d = ds_split_datamode(ds) 311 312 # Fill in the adjusted field with the non-adjusted wherever it is NaN ~/git/github/euroargodev/argopy/argopy/xarray.py in ds_split_datamode(xds) 221 """ 222 # Real-time: --> 223 argo_r = ds.where(ds['DATA_MODE'] == 'R', drop=True) 224 for v in plist: 225 vname = v.upper() + '_ADJUSTED' ~/anaconda/envs/py38/lib/python3.8/site-packages/xarray/core/common.py in where(self, cond, other, drop) 1266 cond = cond.isel(**indexers) 1267 -> 1268 return ops.where_method(self, cond, other) 1269 1270 def close(self: Any) -> None: ~/anaconda/envs/py38/lib/python3.8/site-packages/xarray/core/ops.py in where_method(self, cond, other) 191 # alignment for three arguments is complicated, so don't support it yet 192 join = "inner" if other is dtypes.NA else "exact" --> 193 return apply_ufunc( 194 duck_array_ops.where_method, 195 self, ~/anaconda/envs/py38/lib/python3.8/site-packages/xarray/core/computation.py in apply_ufunc(func, input_core_dims, output_core_dims, exclude_dims, vectorize, join, dataset_join, dataset_fill_value, keep_attrs, kwargs, dask, output_dtypes, output_sizes, meta, dask_gufunc_kwargs, *args) 1090 # feed datasets apply_variable_ufunc through apply_dataset_vfunc 1091 elif any(is_dict_like(a) for a in args): -> 1092 return apply_dataset_vfunc( 1093 variables_vfunc, 1094 *args, ~/anaconda/envs/py38/lib/python3.8/site-packages/xarray/core/computation.py in apply_dataset_vfunc(func, signature, join, dataset_join, fill_value, exclude_dims, keep_attrs, *args) 405 ) 406 --> 407 list_of_coords = build_output_coords(args, signature, exclude_dims) 408 args = [getattr(arg, "data_vars", arg) for arg in args] 409 ~/anaconda/envs/py38/lib/python3.8/site-packages/xarray/core/computation.py in build_output_coords(args, signature, exclude_dims) 222 else: 223 # TODO: save these merged indexes, instead of re-computing them later --> 224 merged_vars, unused_indexes = merge_coordinates_without_align( 225 coords_list, exclude_dims=exclude_dims 226 ) ~/anaconda/envs/py38/lib/python3.8/site-packages/xarray/core/merge.py in merge_coordinates_without_align(objects, prioritized, exclude_dims) 327 filtered = collected 328 --> 329 return merge_collected(filtered, prioritized) 330 331 ~/anaconda/envs/py38/lib/python3.8/site-packages/xarray/core/merge.py in merge_collected(grouped, prioritized, compat) 227 variables = [variable for variable, _ in elements_list] 228 try: --> 229 merged_vars[name] = unique_variable(name, variables, compat) 230 except MergeError: 231 if compat != "minimal": ~/anaconda/envs/py38/lib/python3.8/site-packages/xarray/core/merge.py in unique_variable(name, variables, compat, equals) 118 if compat == "broadcast_equals": 119 dim_lengths = broadcast_dimension_size(variables) --> 120 out = out.set_dims(dim_lengths) 121 122 if compat == "no_conflicts": ~/anaconda/envs/py38/lib/python3.8/site-packages/xarray/core/variable.py in set_dims(self, dims, shape) 1438 # don't use broadcast_to unless necessary so the result remains 1439 # writeable if possible -> 1440 expanded_data = self.data 1441 elif shape is not None: 1442 dims_map = dict(zip(dims, shape)) ~/anaconda/envs/py38/lib/python3.8/site-packages/xarray/core/variable.py in data(self) 357 return self._data 358 else: --> 359 return self.values 360 361 @data.setter ~/anaconda/envs/py38/lib/python3.8/site-packages/xarray/core/variable.py in values(self) 508 def values(self): 509 """The variable's data as a numpy.ndarray""" --> 510 return _as_array_or_item(self._data) 511 512 @values.setter ~/anaconda/envs/py38/lib/python3.8/site-packages/xarray/core/variable.py in _as_array_or_item(data) 270 data = data.get() 271 else: --> 272 data = np.asarray(data) 273 if data.ndim == 0: 274 if data.dtype.kind == "M": ~/anaconda/envs/py38/lib/python3.8/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order) 81 82 """ ---> 83 return array(a, dtype, copy=False, order=order) 84 85 ~/anaconda/envs/py38/lib/python3.8/site-packages/xarray/core/indexing.py in __array__(self, dtype) 683 684 def __array__(self, dtype=None): --> 685 self._ensure_cached() 686 return np.asarray(self.array, dtype=dtype) 687 ~/anaconda/envs/py38/lib/python3.8/site-packages/xarray/core/indexing.py in _ensure_cached(self) 680 def _ensure_cached(self): 681 if not isinstance(self.array, NumpyIndexingAdapter): --> 682 self.array = NumpyIndexingAdapter(np.asarray(self.array)) 683 684 def __array__(self, dtype=None): ~/anaconda/envs/py38/lib/python3.8/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order) 81 82 """ ---> 83 return array(a, dtype, copy=False, order=order) 84 85 ~/anaconda/envs/py38/lib/python3.8/site-packages/xarray/core/indexing.py in __array__(self, dtype) 653 654 def __array__(self, dtype=None): --> 655 return np.asarray(self.array, dtype=dtype) 656 657 def __getitem__(self, key): ~/anaconda/envs/py38/lib/python3.8/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order) 81 82 """ ---> 83 return array(a, dtype, copy=False, order=order) 84 85 ~/anaconda/envs/py38/lib/python3.8/site-packages/xarray/core/indexing.py in __array__(self, dtype) 558 def __array__(self, dtype=None): 559 array = as_indexable(self.array) --> 560 return np.asarray(array[self.key], dtype=None) 561 562 def transpose(self, order): ~/anaconda/envs/py38/lib/python3.8/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order) 81 82 """ ---> 83 return array(a, dtype, copy=False, order=order) 84 85 ~/anaconda/envs/py38/lib/python3.8/site-packages/xarray/coding/variables.py in __array__(self, dtype) 68 69 def __array__(self, dtype=None): ---> 70 return self.func(self.array) 71 72 def __repr__(self): ~/anaconda/envs/py38/lib/python3.8/site-packages/xarray/coding/times.py in decode_cf_datetime(num_dates, units, calendar, use_cftime) 156 if use_cftime is None: 157 try: --> 158 dates = _decode_datetime_with_pandas(flat_num_dates, units, calendar) 159 except (KeyError, OutOfBoundsDatetime, OverflowError): 160 dates = _decode_datetime_with_cftime( ~/anaconda/envs/py38/lib/python3.8/site-packages/xarray/coding/times.py in _decode_datetime_with_pandas(flat_num_dates, units, calendar) 121 with warnings.catch_warnings(): 122 warnings.filterwarnings("ignore", "invalid value encountered", RuntimeWarning) --> 123 pd.to_timedelta(flat_num_dates.min(), delta) + ref_date 124 pd.to_timedelta(flat_num_dates.max(), delta) + ref_date 125 ~/anaconda/envs/py38/lib/python3.8/site-packages/numpy/core/_methods.py in _amin(a, axis, out, keepdims, initial, where) 41 def _amin(a, axis=None, out=None, keepdims=False, 42 initial=_NoValue, where=True): ---> 43 return umr_minimum(a, axis, None, out, keepdims, initial, where) 44 45 def _sum(a, axis=None, dtype=None, out=None, keepdims=False, ValueError: zero-size array to reduction operation minimum which has no identity

Problem Description

Obviously, argopy must work with the last versions of xarray.

I noticed that in the breaking changes of the 0.16.1 version that:

DataArray.astype() and Dataset.astype() now preserve attributes. Keep the old behavior by passing keep_attrs=False

This may be the origin of the issue here.

Versions

Output of `argopy.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 3.8.6 | packaged by conda-forge | (default, Oct 7 2020, 18:42:56) [Clang 10.0.1 ] python-bits: 64 OS: Darwin OS-release: 18.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 argopy: 999 xarray: 0.16.1 pandas: 1.1.3 numpy: 1.19.2 scipy: 1.5.2 fsspec: 0.8.4 erddapy: 0.7.2 netCDF4: 1.5.4 pydap: installed h5netcdf: 0.8.1 h5py: 2.10.0 Nio: None zarr: 2.4.0 cftime: 1.2.1 nc_time_axis: 1.2.0 PseudoNetCDF: installed rasterio: 1.1.7 cfgrib: 0.9.8.4 iris: 2.4.0 bottleneck: 1.3.2 dask: 2.30.0 distributed: 2.30.0 matplotlib: 3.3.2 cartopy: 0.18.0 seaborn: 0.11.0 numbagg: installed gsw: 3.4.0 setuptools: 49.6.0.post20201009 pip: 20.2.3 conda: None pytest: 6.1.1 IPython: 7.18.1 sphinx: None
gmaze commented 3 years ago

the error above, that we have with xarray 0.16.1 and not previous versions is in the line:

argo_r = ds.where(ds['DATA_MODE'] == 'R', drop=True)

which triggers an enigmatic:

ValueError: zero-size array to reduction operation minimum which has no identity
gmaze commented 3 years ago

I have to notice than the mask here is full of False, so that the argo_r output will be empty.

This is what is returned with an xarray version older than 0.16.1:

<xarray.Dataset>
 Dimensions:                   (N_POINTS: 0)
 Coordinates:
     TIME                      (N_POINTS) datetime64[ns] 
     LATITUDE                  (N_POINTS) float64 
     LONGITUDE                 (N_POINTS) float64 
   * N_POINTS                  (N_POINTS) int64 
 Data variables:
     CONFIG_MISSION_NUMBER     (N_POINTS) float64
[...]

This resembles this old issue: https://github.com/pydata/xarray/issues/1329

gmaze commented 3 years ago

So it's the time decoding of an empty array that fails here, If we remove the TIME variable before the where statement, it works and return an empty array:

ds2 = ds.copy(deep=True)
ds2 = ds2.drop_vars('TIME')
ds2.where(ds2['DATA_MODE'] == 'R', drop=True)

returns:

<xarray.Dataset>
Dimensions:                   (N_POINTS: 0)
Coordinates:
    LATITUDE                  (N_POINTS) float64 
    LONGITUDE                 (N_POINTS) float64 
  * N_POINTS                  (N_POINTS) int64 
Data variables:
    CONFIG_MISSION_NUMBER     (N_POINTS) float64 
[...]

and no error is raised anymore

To fix this, we should then monitor for the possible xarray error and turn around it like in https://github.com/atmtools/typhon/pull/37

gmaze commented 3 years ago

I implemented a fix in #65 But I still don't understand why this popped up in 0.16.1 and not before ...