Open jwhhh opened 2 years ago
Does it work if you don't use the blobs_dtype
argument and leave everything else unchanged?
Do you have any ideas on how to improve this? I am not very familiar with custom dtypes. What do you expect should happen in that case? What does xarray do when given a numpy array with custom dtypes?
Thanks for replying. It works perfectly well if I don't use blobs_dtype
.
I played around it a bit just now, and believe that we can add some checking around this line. Something like:
if len(blobs.dtype) == 0:
# continue as usual
nblobs, nwalkers, ndraws, *_ = blobs.shape
else:
nwalkers, ndraws, *_ = blobs.shape
nblobs = len(blobs.dtype)
I'm expecting it to detect the dimension, and the names set by az.from_emcee(sampler, blob_names=[...], blob_groups=[...]
to overwrite the dtype names set from emcee.EnsembleSampler(blobs_dtype=...)
. However I'm not sure about the types on the xarray side. It might be complicated if we have to deal with types here. What I was expecting is only that it can recognise that there can be more than 1 blobs attribute in this case.
I have played a little bit with xarray and custom dtypes:
>>> a = np.array([(-10, .2), (-15, .4)], dtype=[("log_prior", float), ("mean", float)])
>>> a
array([(-10., 0.2), (-15., 0.4)],
dtype=[('log_prior', '<f8'), ('mean', '<f8')])
>>> a.shape
(2,)
>>> xr.DataArray(a)
<xarray.DataArray (dim_0: 2)>
array([(-10., 0.2), (-15., 0.4)],
dtype=[('log_prior', '<f8'), ('mean', '<f8')])
Dimensions without coordinates: dim_0
xarray closely follows numpy. The array a
has shape (2,)
in both cases, so it would be a complete change in behaviour for ArviZ to take that as a (2, 2)
array. According to both numpy and xarray there is no dimension to be detected. Moreover, doing this "expansion" of custom dtype to a dimension would also force us to copy the data into the new (here 2x2) array with float
dtype, something which will not always be possible. Numpy arrays have the same dtype on all positions, but with custom dtypes one can use [("log_prior", float), ("sum", int)]
(and other combinations), here the int can be promoted to float but we are changing important info of the data and it might not always be possible to do this either.
I don't think we should change how this works with regards to dtype. If you think it could help we could add a note to https://python.arviz.org/en/latest/getting_started/ConversionGuideEmcee.html explaining the behaviour with blobs using custom dtypes.
Describe the bug
When using an
emcee.EnsembleSampler
with custom dtype defined (see here),EnsembleSampler.get_blobs()
returns a numpy structured array that has shape(nsteps, nwalkers)
. The shape information doesn't reflect the number of blobs, soEmceeConverter.blobs_to_dict()
treats it as having only one blobs even when there are more.To Reproduce
Running the following script:
yields the following error:
Expected behavior
The blobs should be 2 dimensions. So it shouldn't expect only 1.
Additional context
MacOS v11.5.1 python version: 3.10.4 arviz version: v0.12.1, conda-forge emcee version: v3.1.2, conda-forge numpy version: v1.22.3, conda-forge