Open tommylees112 opened 5 years ago
We need a more flexible selection for the variables, but not sure what would be our best option for the API.
@OriolAbril did we have any update on this issue?
This issue is actually due to indexing properties of xarray. Simple ArviZ unrelated example below:
import xarray as xr
import numpy as np
data = xr.DataArray(
data=np.random.random(size=(4,100,8)),
dims=("chain", "draw", "dim1"),
coords={"chain": range(4), "draw": range(100), "dim1": np.random.choice([0,1,2], size=8)}
)
print(data)
# output
# <xarray.DataArray (chain: 4, draw: 100, dim1: 8)>
# array([[[0.828962, 0.514844, ..., 0.180102, 0.365011],
# ...
# [0.548044, 0.621308, ..., 0.373455, 0.586788]]])
# Coordinates:
# * chain (chain) int64 0 1 2 3
# * draw (draw) int64 0 1 2 3 4 5 6 7 8 9 ... 90 91 92 93 94 95 96 97 98 99
# * dim1 (dim1) int64 1 2 1 1 2 2 1 0
print(data.sel(dim1=1))
# output
# <xarray.DataArray (chain: 4, draw: 100, dim1: 4)>
# array([[[0.828962, 0.697391, 0.384503, 0.180102],
# ...
# [0.548044, 0.793319, 0.735403, 0.373455]]])
# Coordinates:
# * chain (chain) int64 0 1 2 3
# * draw (draw) int64 0 1 2 3 4 5 6 7 8 9 ... 90 91 92 93 94 95 96 97 98 99
# * dim1 (dim1) int64 1 1 1 1
# but
data.sel(dim1=[1,2])
# output
# InvalidIndexError: Reindexing only valid with uniquely valued Index objects
To actually select a subset of a DataArray or Dataset based on a coordinate with repeated index values, where
must be used.
data.where(data.dim1.isin((1,2)), drop=True)
# drop is False by default, and it converts values not fulfilling to NaN, which is not our goal
# output
# <xarray.DataArray (chain: 4, draw: 100, dim1: 7)>
# array([[[0.828962, 0.514844, ..., 0.042839, 0.180102],
# ...
# [0.548044, 0.621308, ..., 0.110815, 0.373455]]])
# Coordinates:
# * chain (chain) int64 0 1 2 3
# * draw (draw) int64 0 1 2 3 4 5 6 7 8 9 ... 90 91 92 93 94 95 96 97 98 99
# * dim1 (dim1) int64 1 2 1 1 2 2 1
We could discuss on how to implement this into ArviZ. For now it must be done by the user before calling ArviZ functions. I guess that in your case it would be something like:
az.plot_forest(
data.posterior.where(data.posterior.county.isin(range(0, 5)), drop=True),
var_names='a'
);
@OriolAbril did we have .where described somewhere in the docs? It is really powerful function.
Short Description
I want to select individual levels from the Dimensions to plot because plotting all of the levels of a variable is slow and the plot uninterpretable.
Code Example or link
I am trying to reproduce the PyStan example here showing the use of multilevel modelling.
The code and extraction are below:
I then extract the data to ArViz
I want to make a plot of only a few of the counties (the model levels). The following takes an age to run because it is plotting ALL counties traces, but I want to select them.
I found this help here:
But I get an error:
Also include the ArviZ version and version of any other relevant packages.
Relevant documentation or public examples
https://mc-stan.org/users/documentation/case-studies/radon.html