Support for Sparse Arrays

ahartikainen commented 5 years ago

In Stan some types have only part of the structure defined/filled - e.g. Cholesky decomposition. Our functions should support these structures.

We should support sparse arrays, but this is a problem with xarray.

ahartikainen commented 5 years ago

For xarray https://github.com/pydata/xarray/issues/1375

GWeindel commented 5 years ago

Sorry for the delay.

When NaN neff parameters are present I can't use the following functions with the full fitted object :

az.summary
az.bfmi
az.effective_sample_size

User has to subset the fitted object to exclude NaN

Sample of the error :

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-21-24a3fc920ff9> in <module>
----> 1 az.effective_sample_size(RT_fit)

~/anaconda2/envs/future/lib/python3.5/site-packages/arviz/stats/diagnostics.py in effective_sample_size(data, var_names)
     59 
     60     dataset = dataset if var_names is None else dataset[var_names]
---> 61     return xr.apply_ufunc(_ess_ufunc, dataset, input_core_dims=(("chain", "draw"),))
     62 
     63 

~/anaconda2/envs/future/lib/python3.5/site-packages/xarray/core/computation.py in apply_ufunc(func, *args, **kwargs)
    980                                    fill_value=dataset_fill_value,
    981                                    dataset_join=dataset_join,
--> 982                                    keep_attrs=keep_attrs)
    983     elif any(isinstance(a, DataArray) for a in args):
    984         return apply_dataarray_ufunc(variables_ufunc, *args,

~/anaconda2/envs/future/lib/python3.5/site-packages/xarray/core/computation.py in apply_dataset_ufunc(func, *args, **kwargs)
    367     result_vars = apply_dict_of_variables_ufunc(
    368         func, *args, signature=signature, join=dataset_join,
--> 369         fill_value=fill_value)
    370 
    371     if signature.num_outputs > 1:

~/anaconda2/envs/future/lib/python3.5/site-packages/xarray/core/computation.py in apply_dict_of_variables_ufunc(func, *args, **kwargs)
    312     result_vars = OrderedDict()
    313     for name, variable_args in zip(names, grouped_by_name):
--> 314         result_vars[name] = func(*variable_args)
    315 
    316     if signature.num_outputs > 1:

~/anaconda2/envs/future/lib/python3.5/site-packages/xarray/core/computation.py in apply_variable_ufunc(func, *args, **kwargs)
    559             raise ValueError('unknown setting for dask array handling in '
    560                              'apply_ufunc: {}'.format(dask))
--> 561     result_data = func(*input_data)
    562 
    563     if signature.num_outputs == 1:

~/anaconda2/envs/future/lib/python3.5/site-packages/arviz/stats/diagnostics.py in _ess_ufunc(ary)
     70     target = np.empty(ary.shape[:-2])
     71     for idx in np.ndindex(target.shape):
---> 72         target[idx] = _get_ess(ary[idx])
     73     return target
     74 

~/anaconda2/envs/future/lib/python3.5/site-packages/arviz/stats/diagnostics.py in _get_ess(sample_array)
    117             rho_hat_t[t + 2] = rho_hat_t[t + 1]
    118         t += 2
--> 119     ess = int((n_chain * n_draws) / (-1.0 + 2.0 * np.sum(rho_hat_t)))
    120     return ess
    121 

ValueError: cannot convert float NaN to integer

ahartikainen commented 5 years ago

Hi, I think that is fixable with the following

ess = int((n_chain * n_draws) / (-1.0 + 2.0 * np.sum(rho_hat_t))) if not np.any(np.isnan(rho_hat_t))) else np.nan

This comes from the problem that there is not integer NaN (until latest pandas)

ahartikainen commented 5 years ago

Hi, how was bfmi broken?

There is a "bug" that you need to manually use data.sample_stats.energy. So inserting InferenceData is not going to work. This should be fixed (see. https://github.com/arviz-devs/arviz/issues/501 )

GWeindel commented 5 years ago

Right, manually use sample_stats.energy works.

In case, a sample form the warning I get

/home/gabriel/anaconda2/envs/future/lib/python3.5/site-packages/arviz/stats/stats.py:41: RuntimeWarning: Mean of empty slice.
  return np.square(np.diff(energy_mat, axis=1)).mean(axis=1) / np.var(energy_mat, axis=1)

---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-28-07a9031ba490> in <module>
----> 1 az.bfmi(RT_fit)

~/anaconda2/envs/future/lib/python3.5/site-packages/arviz/stats/stats.py in bfmi(energy)
     39     """
     40     energy_mat = np.atleast_2d(energy)
---> 41     return np.square(np.diff(energy_mat, axis=1)).mean(axis=1) / np.var(energy_mat, axis=1)
     42 
     43 

~/anaconda2/envs/future/lib/python3.5/site-packages/numpy/core/_methods.py in _mean(a, axis, dtype, out, keepdims)
     76     if isinstance(ret, mu.ndarray):
     77         ret = um.true_divide(
---> 78                 ret, rcount, out=ret, casting='unsafe', subok=False)
     79         if is_float16_result and out is None:
     80             ret = arr.dtype.type(ret)

ZeroDivisionError: division by zero

ahartikainen commented 5 years ago

Yes, it makes np.array([InferenceData])

arviz-devs / arviz

Support for Sparse Arrays #549