bradyrx / esmtools

:octopus: a toolbox for Earth System Model analysis :octopus:
https://esmtools.readthedocs.io/en/latest/
MIT License
27 stars 4 forks source link

`stats.rm_poly` tests fail when dask array passed #100

Closed lukegre closed 3 years ago

lukegre commented 4 years ago

Overview

Tests for stats.rm_poly fails when x and y are given and input array when y is a dask array. Encountered on the PR #99 resulting in 12 failed tests.

The test conditions below result in failure:
esmtools/tests/test_stats_poly.py::test_rm_poly_against_time_dask[<any>-<any>-False] For more info see: https://travis-ci.com/github/bradyrx/esmtools/builds/198414251

The tests pass if y.load() is forced. So somehow dask is the culprit, though not sure how to feasibly solve the problem.

Traceback

Traceback (most recent call last):
1568  File "/home/travis/build/bradyrx/esmtools/esmtools/tests/test_stats_poly.py", line 197, in test_polyfit_against_time_dask
1569    dt = rm_poly(y, order=2)
1570  File "/home/travis/build/bradyrx/esmtools/esmtools/checks.py", line 79, in wrapper
1571    return func(*args, **kwargs)
1572  File "/home/travis/build/bradyrx/esmtools/esmtools/stats.py", line 541, in rm_poly
1573    output_dtypes=['float64'],
1574  File "/home/travis/miniconda/envs/esmtools-dev/lib/python3.6/site-packages/xarray/core/computation.py", line 1110, in apply_ufunc
1575    keep_attrs=keep_attrs,
1576  File "/home/travis/miniconda/envs/esmtools-dev/lib/python3.6/site-packages/xarray/core/computation.py", line 262, in apply_dataarray_vfunc
1577    result_var = func(*data_vars)
1578  File "/home/travis/miniconda/envs/esmtools-dev/lib/python3.6/site-packages/xarray/core/computation.py", line 700, in apply_variable_ufunc
1579    result_data = func(*input_data)
1580  File "/home/travis/miniconda/envs/esmtools-dev/lib/python3.6/site-packages/xarray/core/computation.py", line 674, in func
1581    **dask_gufunc_kwargs,
1582  File "/home/travis/miniconda/envs/esmtools-dev/lib/python3.6/site-packages/dask/array/gufunc.py", line 431, in apply_gufunc
1583    func, loop_output_dims, *arginds, concatenate=True, **kwargs
1584  File "/home/travis/miniconda/envs/esmtools-dev/lib/python3.6/site-packages/dask/array/blockwise.py", line 250, in blockwise
1585    meta = compute_meta(func, dtype, *args[::2], **kwargs)
1586  File "/home/travis/miniconda/envs/esmtools-dev/lib/python3.6/site-packages/dask/array/utils.py", line 121, in compute_meta
1587    meta = func(*args_meta)
1588  File "/home/travis/miniconda/envs/esmtools-dev/lib/python3.6/site-packages/numpy/lib/function_base.py", line 2108, in __call__
1589    return self._vectorize_call(func=func, args=vargs)
1590  File "/home/travis/miniconda/envs/esmtools-dev/lib/python3.6/site-packages/numpy/lib/function_base.py", line 2182, in _vectorize_call
1591    res = self._vectorize_call_with_signature(func, args)
1592  File "/home/travis/miniconda/envs/esmtools-dev/lib/python3.6/site-packages/numpy/lib/function_base.py", line 2223, in _vectorize_call_with_signature
1593    results = func(*(arg[index] for arg in args))
1594  File "/home/travis/build/bradyrx/esmtools/esmtools/stats.py", line 528, in _rm_poly
1595    fit = _polyfit(x, y, order, nan_policy)
1596  File "/home/travis/build/bradyrx/esmtools/esmtools/stats.py", line 149, in _polyfit
1597    coefs = poly.polyfit(x_mod, y_mod, order)
1598  File "/home/travis/miniconda/envs/esmtools-dev/lib/python3.6/site-packages/numpy/polynomial/polynomial.py", line 1350, in polyfit
1599    return pu._fit(polyvander, x, y, deg, rcond, full, w)
1600  File "/home/travis/miniconda/envs/esmtools-dev/lib/python3.6/site-packages/numpy/polynomial/polyutils.py", line 664, in _fit
1601    raise TypeError("expected non-empty vector for x")
1602TypeError: expected non-empty vector for x
bradyrx commented 3 years ago

I'll give this a look now and see what the deal is...

bradyrx commented 3 years ago

Okay, just putting some notes here as I work through this.

bradyrx commented 3 years ago

Replicating a broken test.... this breaks it:

def gridded_da_float():
    """Mock data of gridded time series in float time."""
    # Wrapper so fixture can be called multiple times.
    # https://alysivji.github.io/pytest-fixures-with-function-arguments.html
    data = np.random.rand(60, 3, 3)
    da = xr.DataArray(data, dims=["time", "lat", "lon"])
    # Annual resolution time axis for 60 years.
    da["time"] = np.arange(1900, 1960)
    return da

# false - float - false
data = gridded_da_float().chunk()
x = data["time"]
y = data.isel(lat=0, lon=0)

X, _ = esmtools.stats._convert_time_and_return_slope_factor(x, "time")
Y = y

def _rm_poly(x, y, order, nan_policy):
    print(x)
    fit = esmtools.stats._polyfit(x, y, order, nan_policy)
    return y - fit

dim = "time"
test = xr.apply_ufunc(
    _rm_poly,
    X,
    Y,
    2,
    "none",
    vectorize=True,
    dask="parallelized",
    input_core_dims=[[dim], [dim], [], []],
    output_core_dims=[[dim]],
    output_dtypes=[float],
)

dask='allowed' fixes it but causes other tests to break. The weird thing is that with dask='parallelized' here and dask arrays, it passes empty arrays to the _rm_poly func. You see with the print statement in _rm_poly an empty list for parallelized and a full list for allowed. I haven't seen this behavior before. I don't know why it's doing this.

bradyrx commented 3 years ago

See https://github.com/bradyrx/esmtools/pull/102. This should fix it. Looks like there was some dask or xarray update that stays true to vectorize. So if you pass a single time series in with input_core_dims=['time'] and vectorize=True it doesn't know how to split it up.

With a gridded product, it passes in each grid cell separately. This just detects whether it's gridded. This seems like a kludge but should be fine.