Ouranosinc / PAVICS-landing

The landing page serving as the entrance to PAVICS
1 stars 0 forks source link

Nb 4 do not work with Python 3.10 (ValueError: `popmean.shape[axis]` must equal 1.) #65

Closed tlvu closed 2 months ago

tlvu commented 1 year ago

Error below only happen with 3.10, not 3.9.

09:04:23  _ PAVICS-landing-fix-nb-for-new-jupyter-env/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-4Ensembles.ipynb::Cell 2 _
09:04:23  Notebook cell execution failed
09:04:23  Cell 2: Cell execution caused an exception
09:04:23  
09:04:23  Input:
09:04:23  fut = ds_ens.sel(time=slice("2071", "2100")).chunk(dict(realization=-1))
09:04:23  ref = ds_ens.sel(time=slice("1981", "2010")).chunk(dict(realization=-1))
09:04:23  
09:04:23  chng_f, pos_f = xens.change_significance(
09:04:23      fut.sel(time=fut["time.season"] == "JJA"),
09:04:23      ref.sel(time=ref["time.season"] == "JJA"),
09:04:23      test="ttest",
09:04:23  )
09:04:23  plt.figure(
09:04:23      figsize=(15, 6),
09:04:23  )
09:04:23  
09:04:23  plt.subplot(1, 2, 1)
09:04:23  chng_f.tx_days_above.plot(cmap="RdBu_r", vmin=0, vmax=1)
09:04:23  
09:04:23  plt.title(chng_f.description.split(".")[0])
09:04:23  
09:04:23  plt.subplot(1, 2, 2)
09:04:23  pos_f.tx_days_above.plot(cmap="RdBu_r", vmin=0, vmax=1)
09:04:23  plt.title(pos_f.description.split(".")[0])
09:04:23  
09:04:23  display()
09:04:23  
09:04:23  Traceback:
09:04:23  
09:04:23  ---------------------------------------------------------------------------
09:04:23  ValueError                                Traceback (most recent call last)
09:04:23  File /opt/conda/envs/birdy/lib/python3.10/site-packages/scipy/stats/_stats_py.py:6265, in ttest_1samp(a, popmean, axis, nan_policy, alternative)
09:04:23     6264 try:
09:04:23  -> 6265     popmean = np.squeeze(popmean, axis=axis)
09:04:23     6266 except ValueError as e:
09:04:23  
09:04:23  File <__array_function__ internals>:200, in squeeze(*args, **kwargs)
09:04:23  
09:04:23  File /opt/conda/envs/birdy/lib/python3.10/site-packages/numpy/core/fromnumeric.py:1571, in squeeze(a, axis)
09:04:23     1570 else:
09:04:23  -> 1571     return squeeze(axis=axis)
09:04:23  
09:04:23  ValueError: cannot select an axis to squeeze out which has size not equal to one
09:04:23  
09:04:23  The above exception was the direct cause of the following exception:
09:04:23  
09:04:23  ValueError                                Traceback (most recent call last)
09:04:23  Cell In[3], line 14
09:04:23        9 plt.figure(
09:04:23       10     figsize=(15, 6),
09:04:23       11 )
09:04:23       13 plt.subplot(1, 2, 1)
09:04:23  ---> 14 chng_f.tx_days_above.plot(cmap="RdBu_r", vmin=0, vmax=1)
09:04:23       16 plt.title(chng_f.description.split(".")[0])
09:04:23       18 plt.subplot(1, 2, 2)
09:04:23  
09:04:23  File /opt/conda/envs/birdy/lib/python3.10/site-packages/xarray/plot/plot.py:547, in _PlotMethods.__call__(self, **kwargs)
09:04:23      546 def __call__(self, **kwargs):
09:04:23  --> 547     return plot(self._da, **kwargs)
09:04:23  
09:04:23  File /opt/conda/envs/birdy/lib/python3.10/site-packages/xarray/plot/plot.py:261, in plot(darray, row, col, col_wrap, ax, hue, rtol, subplot_kws, **kwargs)
09:04:23      209 def plot(
09:04:23      210     darray,
09:04:23      211     row=None,
09:04:23     (...)
09:04:23      218     **kwargs,
09:04:23      219 ):
09:04:23      220     """
09:04:23      221     Default plot of DataArray using :py:mod:`matplotlib:matplotlib.pyplot`.
09:04:23      222 
09:04:23     (...)
09:04:23      259     xarray.DataArray.squeeze
09:04:23      260     """
09:04:23  --> 261     darray = darray.squeeze().compute()
09:04:23      263     plot_dims = set(darray.dims)
09:04:23      264     plot_dims.discard(row)
09:04:23  
09:04:23  File /opt/conda/envs/birdy/lib/python3.10/site-packages/xarray/core/dataarray.py:1088, in DataArray.compute(self, **kwargs)
09:04:23     1069 """Manually trigger loading of this array's data from disk or a
09:04:23     1070 remote source into memory and return a new array. The original is
09:04:23     1071 left unaltered.
09:04:23     (...)
09:04:23     1085 dask.compute
09:04:23     1086 """
09:04:23     1087 new = self.copy(deep=False)
09:04:23  -> 1088 return new.load(**kwargs)
09:04:23  
09:04:23  File /opt/conda/envs/birdy/lib/python3.10/site-packages/xarray/core/dataarray.py:1062, in DataArray.load(self, **kwargs)
09:04:23     1044 def load(self: T_DataArray, **kwargs) -> T_DataArray:
09:04:23     1045     """Manually trigger loading of this array's data from disk or a
09:04:23     1046     remote source into memory and return this array.
09:04:23     1047 
09:04:23     (...)
09:04:23     1060     dask.compute
09:04:23     1061     """
09:04:23  -> 1062     ds = self._to_temp_dataset().load(**kwargs)
09:04:23     1063     new = self._from_temp_dataset(ds)
09:04:23     1064     self._variable = new._variable
09:04:23  
09:04:23  File /opt/conda/envs/birdy/lib/python3.10/site-packages/xarray/core/dataset.py:735, in Dataset.load(self, **kwargs)
09:04:23      732 import dask.array as da
09:04:23      734 # evaluate all the dask arrays simultaneously
09:04:23  --> 735 evaluated_data = da.compute(*lazy_data.values(), **kwargs)
09:04:23      737 for k, data in zip(lazy_data, evaluated_data):
09:04:23      738     self.variables[k].data = data
09:04:23  
09:04:23  File /opt/conda/envs/birdy/lib/python3.10/site-packages/dask/base.py:599, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
09:04:23      596     keys.append(x.__dask_keys__())
09:04:23      597     postcomputes.append(x.__dask_postcompute__())
09:04:23  --> 599 results = schedule(dsk, keys, **kwargs)
09:04:23      600 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
09:04:23  
09:04:23  File /opt/conda/envs/birdy/lib/python3.10/site-packages/dask/threaded.py:89, in get(dsk, keys, cache, num_workers, pool, **kwargs)
09:04:23       86     elif isinstance(pool, multiprocessing.pool.Pool):
09:04:23       87         pool = MultiprocessingPoolExecutor(pool)
09:04:23  ---> 89 results = get_async(
09:04:23       90     pool.submit,
09:04:23       91     pool._max_workers,
09:04:23       92     dsk,
09:04:23       93     keys,
09:04:23       94     cache=cache,
09:04:23       95     get_id=_thread_get_id,
09:04:23       96     pack_exception=pack_exception,
09:04:23       97     **kwargs,
09:04:23       98 )
09:04:23      100 # Cleanup pools associated to dead threads
09:04:23      101 with pools_lock:
09:04:23  
09:04:23  File /opt/conda/envs/birdy/lib/python3.10/site-packages/dask/local.py:511, in get_async(submit, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, chunksize, **kwargs)
09:04:23      509         _execute_task(task, data)  # Re-execute locally
09:04:23      510     else:
09:04:23  --> 511         raise_exception(exc, tb)
09:04:23      512 res, worker_id = loads(res_info)
09:04:23      513 state["cache"][key] = res
09:04:23  
09:04:23  File /opt/conda/envs/birdy/lib/python3.10/site-packages/dask/local.py:319, in reraise(exc, tb)
09:04:23      317 if exc.__traceback__ is not tb:
09:04:23      318     raise exc.with_traceback(tb)
09:04:23  --> 319 raise exc
09:04:23  
09:04:23  File /opt/conda/envs/birdy/lib/python3.10/site-packages/dask/local.py:224, in execute_task(key, task_info, dumps, loads, get_id, pack_exception)
09:04:23      222 try:
09:04:23      223     task, data = loads(task_info)
09:04:23  --> 224     result = _execute_task(task, data)
09:04:23      225     id = get_id()
09:04:23      226     result = dumps((result, id))
09:04:23  
09:04:23  File /opt/conda/envs/birdy/lib/python3.10/site-packages/dask/core.py:119, in _execute_task(arg, cache, dsk)
09:04:23      115     func, args = arg[0], arg[1:]
09:04:23      116     # Note: Don't assign the subtask results to a variable. numpy detects
09:04:23      117     # temporaries by their reference count and can execute certain
09:04:23      118     # operations in-place.
09:04:23  --> 119     return func(*(_execute_task(a, cache) for a in args))
09:04:23      120 elif not ishashable(arg):
09:04:23      121     return arg
09:04:23  
09:04:23  File /opt/conda/envs/birdy/lib/python3.10/site-packages/dask/optimization.py:990, in SubgraphCallable.__call__(self, *args)
09:04:23      988 if not len(args) == len(self.inkeys):
09:04:23      989     raise ValueError("Expected %d args, got %d" % (len(self.inkeys), len(args)))
09:04:23  --> 990 return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
09:04:23  
09:04:23  File /opt/conda/envs/birdy/lib/python3.10/site-packages/dask/core.py:149, in get(dsk, out, cache)
09:04:23      147 for key in toposort(dsk):
09:04:23      148     task = dsk[key]
09:04:23  --> 149     result = _execute_task(task, cache)
09:04:23      150     cache[key] = result
09:04:23      151 result = _execute_task(out, cache)
09:04:23  
09:04:23  File /opt/conda/envs/birdy/lib/python3.10/site-packages/dask/core.py:119, in _execute_task(arg, cache, dsk)
09:04:23      115     func, args = arg[0], arg[1:]
09:04:23      116     # Note: Don't assign the subtask results to a variable. numpy detects
09:04:23      117     # temporaries by their reference count and can execute certain
09:04:23      118     # operations in-place.
09:04:23  --> 119     return func(*(_execute_task(a, cache) for a in args))
09:04:23      120 elif not ishashable(arg):
09:04:23      121     return arg
09:04:23  
09:04:23  File /opt/conda/envs/birdy/lib/python3.10/site-packages/numpy/lib/function_base.py:2329, in vectorize.__call__(self, *args, **kwargs)
09:04:23     2326     vargs = [args[_i] for _i in inds]
09:04:23     2327     vargs.extend([kwargs[_n] for _n in names])
09:04:23  -> 2329 return self._vectorize_call(func=func, args=vargs)
09:04:23  
09:04:23  File /opt/conda/envs/birdy/lib/python3.10/site-packages/numpy/lib/function_base.py:2403, in vectorize._vectorize_call(self, func, args)
09:04:23     2401 """Vectorized call to `func` over positional `args`."""
09:04:23     2402 if self.signature is not None:
09:04:23  -> 2403     res = self._vectorize_call_with_signature(func, args)
09:04:23     2404 elif not args:
09:04:23     2405     res = func()
09:04:23  
09:04:23  File /opt/conda/envs/birdy/lib/python3.10/site-packages/numpy/lib/function_base.py:2443, in vectorize._vectorize_call_with_signature(self, func, args)
09:04:23     2440 nout = len(output_core_dims)
09:04:23     2442 for index in np.ndindex(*broadcast_shape):
09:04:23  -> 2443     results = func(*(arg[index] for arg in args))
09:04:23     2445     n_results = len(results) if isinstance(results, tuple) else 1
09:04:23     2447     if nout != n_results:
09:04:23  
09:04:23  File /opt/conda/envs/birdy/lib/python3.10/site-packages/xclim/ensembles/_robustness.py:208, in change_significance.<locals>._ttest_func(f, r)
09:04:23      206 def _ttest_func(f, r):
09:04:23      207     # scipy>=1.9: popmean.axis[-1] must equal 1 for both fut and ref
09:04:23  --> 208     return spstats.ttest_1samp(
09:04:23      209         f, r[..., np.newaxis], axis=-1, nan_policy="omit"
09:04:23      210     )[1]
09:04:23  
09:04:23  File /opt/conda/envs/birdy/lib/python3.10/site-packages/scipy/stats/_axis_nan_policy.py:502, in _axis_nan_policy_factory.<locals>.axis_nan_policy_decorator.<locals>.axis_nan_policy_wrapper(***failed resolving arguments***)
09:04:23      500 if sentinel:
09:04:23      501     samples = _remove_sentinel(samples, paired, sentinel)
09:04:23  --> 502 res = hypotest_fun_out(*samples, **kwds)
09:04:23      503 res = result_to_tuple(res)
09:04:23      504 res = _add_reduced_axes(res, reduced_axes, keepdims)
09:04:23  
09:04:23  File /opt/conda/envs/birdy/lib/python3.10/site-packages/scipy/stats/_stats_py.py:6267, in ttest_1samp(a, popmean, axis, nan_policy, alternative)
09:04:23     6265     popmean = np.squeeze(popmean, axis=axis)
09:04:23     6266 except ValueError as e:
09:04:23  -> 6267     raise ValueError("`popmean.shape[axis]` must equal 1.") from e
09:04:23     6268 d = mean - popmean
09:04:23     6269 v = _var(a, axis, ddof=1)
09:04:23  
09:04:23  ValueError: `popmean.shape[axis]` must equal 1.
Zeitsperre commented 1 year ago

@aulemahal

Is it possible that the fix for that was introduced here: https://github.com/Ouranosinc/xclim/commit/56d6ef7a957f47c6757fef6dadf0fd893163f2c4 is not working properly?

We even test to ensure that warnings are being properly raised! (https://github.com/Ouranosinc/xclim/commit/60e9a70c5abeb0e9a26ece895c060e46baa06dc6)

This fix was introduced in the latest xclim (v0.43.0) so the only explanation I can offer is xclim is not at the newest version in this environment.

Zeitsperre commented 1 year ago

Potential cause found!

The changes in scipy were completed here: https://github.com/scipy/scipy/pull/16835 and were implemented in scipy>=1.10.1 (not 1.9.0).

@tlvu What version of scipy is running in the environment?

tlvu commented 1 year ago

Potential cause found!

The changes in scipy were completed here: scipy/scipy#16835 and were implemented in scipy>=1.10.1 (not 1.9.0).

@tlvu What version of scipy is running in the environment?

py310 not working : xclim=0.43.0=py310hff52083_1 + scipy=1.10.1=py310ha4c1d20_3

py39 working: xclim=0.43.0=py39hf3d152e_1 + scipy=1.9.1=py39h8ba3f38_0

huard commented 1 year ago

I don't have a full understanding of whats going on, but thepopmean` error appears when the array is empty, which happens when the time series is entirely filled with NaNs that are compressed by the test algorithm. In short, the ocean domain will fail because the data are all nans.

Not sure why this didn't fail before.

One workaround is to compute the test only on land.

fut = ds_ens.sel(time=slice("2071", "2100")).chunk(dict(realization=-1))
ref = ds_ens.sel(time=slice("1981", "2010")).chunk(dict(realization=-1))

f = fut.sel(time=fut["time.season"] == "JJA").tx_days_above
r = ref.sel(time=ref["time.season"] == "JJA").tx_days_above

import xarray as xr
tmp = xr.Dataset(dict(ref=r, fut=f))

# Find coordinates that are all NaNs
mask = (f.isnull().all(dim="time") + f.isnull().all(dim="time")).isel(realization=0).compute()

# Compute significance only on valid coordinates
valid = tmp.groupby(mask)[False]
chng_f, pos_f = xens.change_significance(valid.fut, valid.ref, test="ttest")

# Unstack results
chng_f = chng_f.unstack()
pos_f = pos_f.unstack()

Now for a reason I don't understand, with the Alpha env on Jupyter the results are all NaNs or inf, so there might be something else going on.

huard commented 1 year ago

PR for fix in xclim: https://github.com/Ouranosinc/xclim/pull/1380/

huard commented 1 year ago

No changes will be needed to the NB if we merge the xclim PR and release a new xclim version before we update scipy to 1.10 in the jupyter env.

Zeitsperre commented 1 year ago

Just to confirm, we will not be releasing a new xclim at 17h20 on a Friday, right?

tlvu commented 1 year ago

Should be fixed by https://github.com/Ouranosinc/PAVICS-e2e-workflow-tests/pull/121