Open ahartikainen opened 3 years ago
I have a lot of ideas on operations, with correctly labeled objects I think we can do a lot of magic
I like this idea. I've worked on similar things in the past and will be willing to help.
I think one of the options is having a one or two layer minimal library on top of xarray to help with that.
The one layer option would be basing everything on ArviZ and its conventions, mostly the reserved dims chain, draw
but we could add more info to our objects like some metadata, using dim
and dim_bis
or something like that for square matrices...
The two layer option (I think this would be better) would be having first a more generic xarray-linalg or something like that, where we use xr.apply_ufunc
to wrap numpy.linalg
functions and then the ArviZ specific functions that call these lower level functions. The lower layer could even be moved to xarray_contrib
if there is more interest in that, so we could share part of the maintainment burden, the higher layer would be kept under arviz-devs
. I have some things that could be useful for this at https://github.com/OriolAbril/calaix_de_sastre/tree/master/xarray-utils. For now they are at the lower layer level, are not linalg specific and some use numba to create proper performant ufuncs from numpy functions that are not ufuncs.
Would the idea also extend to being able to pop samples into functions for easier propagation?
It might also be nice to have a convenient flatten command over chains.
Yes to everything, we already had https://github.com/arviz-devs/arviz/issues/1469 too
Look into easy interfacing with scipy.stats module too as ufuncs.
btw, we have done something like this in another package and perhaps it s useful to look here (https://github.com/threeML/threeML/blob/master/threeML/random_variates.py) though this is with numpy and not xarray. we could prt some of this if useful.
I have written down and organized everything I had scattered around and tried to make the library somewhat coherent. Somehow I had forgotten about this issue until today.
Here is the link: https://xarray-einstats.readthedocs.io/en/latest/. It has linear algebra wrappers from numpy, rv and summary wrappers from scipy.stats, a couple wrappers for einops (ignoring coord values) and a super cool (if I may say so myself :sunglasses:) histogram that bins in a vectorized way along any given dimensions (powered by numba).
I am currently updating some examples from the pymc collection to use xarray-einstats to test it a bit more and iron it out, then make a 0.2 release and start advertising it more broadly. Feature requests, suggestions and collaborations always welcome whatever the release state though
I think we could wrap our mcmc parameters in a custom class that would make it easier to work with the mcmc results (in the spirit of rval (?) in posterior package).
In mcmc world (samples)
Scalar -> shape=chain,draw Vector -> shape=chain,draw,vector_dim Matrix -> shape=chain,draw,*matrix_dims
example
When we want to do
matrix * vector
product with mcmc results, we need to be careful that correct dimensions are used -> Bayesian variable could handle this so the variable would work as any non-mcmc variable.repr
We don't always need to show all the samples for users but it might be better show some specific statistics (e.g. mean, std)
Our html output could also have other info, rhat/ess (maybe even density picture?)
Similar work can be seen in https://pythonhosted.org/uncertainties/