ahartikainen commented 3 years ago

I think we could wrap our mcmc parameters in a custom class that would make it easier to work with the mcmc results (in the spirit of rval (?) in posterior package).

In mcmc world (samples)

Scalar -> shape=chain,draw Vector -> shape=chain,draw,vector_dim Matrix -> shape=chain,draw,*matrix_dims

example

When we want to do matrix * vector product with mcmc results, we need to be careful that correct dimensions are used -> Bayesian variable could handle this so the variable would work as any non-mcmc variable.

repr

We don't always need to show all the samples for users but it might be better show some specific statistics (e.g. mean, std)

Our html output could also have other info, rhat/ess (maybe even density picture?)

Similar work can be seen in https://pythonhosted.org/uncertainties/

OriolAbril commented 3 years ago

I have a lot of ideas on operations, with correctly labeled objects I think we can do a lot of magic

grburgess commented 3 years ago

I like this idea. I've worked on similar things in the past and will be willing to help.

OriolAbril commented 3 years ago

I think one of the options is having a one or two layer minimal library on top of xarray to help with that.

The one layer option would be basing everything on ArviZ and its conventions, mostly the reserved dims chain, draw but we could add more info to our objects like some metadata, using dim and dim_bis or something like that for square matrices...

The two layer option (I think this would be better) would be having first a more generic xarray-linalg or something like that, where we use xr.apply_ufunc to wrap numpy.linalg functions and then the ArviZ specific functions that call these lower level functions. The lower layer could even be moved to xarray_contrib if there is more interest in that, so we could share part of the maintainment burden, the higher layer would be kept under arviz-devs. I have some things that could be useful for this at https://github.com/OriolAbril/calaix_de_sastre/tree/master/xarray-utils. For now they are at the lower layer level, are not linalg specific and some use numba to create proper performant ufuncs from numpy functions that are not ufuncs.

grburgess commented 3 years ago

Would the idea also extend to being able to pop samples into functions for easier propagation?

It might also be nice to have a convenient flatten command over chains.

OriolAbril commented 3 years ago

Yes to everything, we already had https://github.com/arviz-devs/arviz/issues/1469 too

OriolAbril commented 3 years ago

Look into easy interfacing with scipy.stats module too as ufuncs.

grburgess commented 3 years ago

btw, we have done something like this in another package and perhaps it s useful to look here (https://github.com/threeML/threeML/blob/master/threeML/random_variates.py) though this is with numpy and not xarray. we could prt some of this if useful.

OriolAbril commented 2 years ago

I have written down and organized everything I had scattered around and tried to make the library somewhat coherent. Somehow I had forgotten about this issue until today.

Here is the link: https://xarray-einstats.readthedocs.io/en/latest/. It has linear algebra wrappers from numpy, rv and summary wrappers from scipy.stats, a couple wrappers for einops (ignoring coord values) and a super cool (if I may say so myself :sunglasses:) histogram that bins in a vectorized way along any given dimensions (powered by numba).

I am currently updating some examples from the pymc collection to use xarray-einstats to test it a bit more and iron it out, then make a 0.2 release and start advertising it more broadly. Feature requests, suggestions and collaborations always welcome whatever the release state though

arviz-devs / arviz

Bayesian variable (one-object mode) #1668

example

repr