Open fatihbozdag opened 1 year ago
I found a similar plot here
_, ax = plt.subplots(figsize = (8, 8))
# Add vertical line for the global probability of hitting
ax.axvline(x=(df["H"] / df["AB"]).mean(), ls="--", color="black", alpha=0.5)
# Create forestplot with ArviZ, only for the mean.
az.plot_forest(
[idata_non_hierarchical, idata_hierarchical],
var_names="p(H, AB)_mean",
combined=True,
colors=["#666666", RED],
linewidth=2.6,
markersize=8,
ax=ax
)
# Create custom y axis tick labels
ylabels = [f"H: {round(h)}, AB: {round(ab)}" for h, ab in zip(df["H"].values, df["AB"].values)]
ylabels = list(reversed(ylabels))
# Put the labels for the y axis in the mid of the original location of the tick marks.
ax.set_yticklabels(ylabels, ha="right")
# Create legend
handles = [
Patch(label="Non-hierarchical", facecolor="#666666"),
Patch(label="Hierarchical", facecolor=RED),
Line2D([0], [0], label="Mean probability", ls="--", color="black", alpha=0.5)
]
legend = ax.legend(handles=handles, loc=4, fontsize=14, frameon=True, framealpha=0.8);
(https://bambinos.github.io/bambi/notebooks/hierarchical_binomial_bambi.html), comparing two data on the same variable (this is exactly what I want to do but with categorical response). However, the problem is it is almost impossible to apply the same method on data with more than 14000 obs!
@fatihbozdag thanks for opening the issue!
Could you share a reproducible example (with some data) so I can better help you? The data of course does not need to be real. It can be either simulated or anonymized
Sure thing, Indeed, I've uploaded pickled inference data https://figshare.com/articles/software/results_pickle/21789347 But here is toy data with sample code.
x = np.array(["Pred1"] * 40 + ["Pred2"] * 40 + ["Pred3"] * 70)
y = np.array(["A"] * 50 + ["B"] * 50 + ["C"] * 50)
r = np.array(["R1"] * 90 + ["R2"] * 60)
data = pd.DataFrame({"y":y, "x": x, "r":r})
model = bmb.Model("y~ 0 + x + (1|r)", data, family = "categorical", auto_scale = True)
model.build()
model
Formula: y~ 0 + x + (1|r)
Family name: Categorical
Link: softmax
Observations: 150
Priors:
Common-level effects
x ~ Normal(mu: [0. 0. 0.], sigma: [5.6533 5.6533 5.0111])
Group-level effects
1|r ~ Normal(mu: 0, sigma: HalfNormal(sigma: 4.0329))
with model.backend.model:
idata=pm.sampling_jax.sample_numpyro_nuts(draws= 250, tune = 50, target_accept = .99,
postprocessing_backend = 'cpu', idata_kwargs={"log_likelihood":True},chain_method='parallel')
posterior_predictive = pm.sample_posterior_predictive(trace = idata, extend_inferencedata=True)
idata
Inference data with groups:
> posterior
> posterior_predictive
> log_likelihood
> sample_stats
> observed_data
idata.posterior
<xarray.Dataset>
Dimensions: (chain: 4, draw: 250, x_dim: 3, y_dim: 2, r__factor_dim: 2)
Coordinates:
* chain (chain) int64 0 1 2 3
* draw (draw) int64 0 1 2 3 4 5 6 7 ... 243 244 245 246 247 248 249
* x_dim (x_dim) <U5 'Pred1' 'Pred2' 'Pred3'
* y_dim (y_dim) <U1 'B' 'C'
* r__factor_dim (r__factor_dim) <U2 'R1' 'R2'
Data variables:
x (chain, draw, x_dim, y_dim) float64 -7.286 -2.195 ... 7.844
1|r_offset (chain, draw, r__factor_dim, y_dim) float64 -0.1529 ... -0...
1|r_sigma (chain, draw) float64 10.41 4.648 5.216 ... 5.3 2.921 8.414
1|r (chain, draw, r__factor_dim, y_dim) float64 -1.591 ... -1.161
Attributes:
created_at: 2023-02-04T12:42:15.801626
arviz_version: 0.14.0
model.predict(idata) ###apply bambi predict to obtain mean###
<xarray.Dataset>
Dimensions: (chain: 4, draw: 250, x_dim: 3, y_dim: 2, r__factor_dim: 2,
y_mean_dim: 3, y_obs: 150)
Coordinates:
* chain (chain) int64 0 1 2 3
* draw (draw) int64 0 1 2 3 4 5 6 7 ... 243 244 245 246 247 248 249
* x_dim (x_dim) <U5 'Pred1' 'Pred2' 'Pred3'
* y_dim (y_dim) <U1 'B' 'C'
* r__factor_dim (r__factor_dim) <U2 'R1' 'R2'
* y_mean_dim (y_mean_dim) <U1 'A' 'B' 'C'
* y_obs (y_obs) int64 0 1 2 3 4 5 6 7 ... 143 144 145 146 147 148 149
Data variables:
x (chain, draw, x_dim, y_dim) float64 -7.286 -2.195 ... 7.844
1|r_offset (chain, draw, r__factor_dim, y_dim) float64 -0.1529 ... -0...
1|r_sigma (chain, draw) float64 10.41 4.648 5.216 ... 5.3 2.921 8.414
1|r (chain, draw, r__factor_dim, y_dim) float64 -1.591 ... -1.161
y_mean (chain, draw, y_obs, y_mean_dim) float64 0.9968 ... 0.7472
Attributes:
created_at: 2023-02-04T12:42:15.801626
arviz_version: 0.14.0
I'm not sure I understand exactly what you want to do
Are you trying to reproduce this figure? https://user-images.githubusercontent.com/59308858/216736802-9b1f1f97-80f9-4696-976d-427f8bad4c54.png Although that one has two categories and your response contains three.
"plot ppc (probabilities) of categorical response per each variable included in the random effects"
You could create a plot where on the y-axis there's the probability of the outcome, on the horizontal axis the levels of a grouping variable used in the "random effects" part, and shows multiple estimates with credible intervals (one per response level). Is that what you want? Just take into account that you'll need to pick a value for any other predictor your model may have.
yeap, that's exactly what I want to do, plotting probabilities of A, B, and C given R1 and R2.
P.S. sorry, I cannot focus on the topic and write more details as a massive earthquake struck my hometown, Adana/Turkey. Can you please gimme a few days? Currently, I work as a freelance translator to guide Rescue Teams.
I'm not sure I understand exactly what you want to do
Are you trying to reproduce this figure? https://user-images.githubusercontent.com/59308858/216736802-9b1f1f97-80f9-4696-976d-427f8bad4c54.png Although that one has two categories and your response contains three.
"plot ppc (probabilities) of categorical response per each variable included in the random effects"
You could create a plot where on the y-axis there's the probability of the outcome, on the horizontal axis the levels of a grouping variable used in the "random effects" part, and shows multiple estimates with credible intervals (one per response level). Is that what you want? Just take into account that you'll need to pick a value for any other predictor your model may have.
Hi again. I am back to ordinary life(sort of). How can I produce such a plot? Yeap, this is categorical regression consisting of response with 3 levels.
Maybe I should add more info. Indeed the problem is I am unable to include "response reference level" in the plot. Or else, Arviz az.plot_forest
works without reference level.
update for another unsuccessful attempt;
following model.predict(idata)
, values including reference level are added to inference data as y.mean_dim
. However, arviz coords
does not accept new variable as a valid one.
coords = {"y_dim": ["B","C"]} ## works but not reference level, assuming "A" is the reference.
az.summary(idata, var_names =x_dim, coords = coords)
coords = {"y_mean_dim": ["B","C"]}
## does not work, but idata.posterior.y_mean_dim
actually includes the reference level yet
az.summary(idata, var_names = x_dim, coords = coords)
says
KeyError: 'Coords should follow mapping format {coord_name:[dim1, dim2]}. Check that coords structure is correct and dimensions are valid. "\'y_mean_dim\' is not a valid dimension or coordinate"'
@fatihbozdag I'm sorry you were affected by the earthquake. I hope you're doing better now!
What I don't understand from this visualization https://user-images.githubusercontent.com/59308858/216736802-9b1f1f97-80f9-4696-976d-427f8bad4c54.png is what the line represents. I guess the line is a probability going from 0 to 1, and then the closer it's to zero, the closer it's to DO, and the closer it's to one, the closer it's to PD. Is that correct?
But in this case, you have three response levels, so you can't get a plot like that one.
Also, you have other predictors on top of the group-level effect you're interested in (r
in the toy example). How did you handle these other predictors in the visualization above? Did you set them to an equal value, or did you marginalize over their possible range of values?
Thank you, @tomicapretto,
well, doing much better but the whole country is struggling and is likely to do so for a while. As of writing the post, we've just had another one (5.1) I guess, we have to get used to it. Anyways.
Your interpretation is correct. Other predictors are kept at the response level since all predictors are categorical variables.
to clarify,
model.predict(idata, kind = "pps", include_group_specific=True)
ax = az.plot_ppc(idata)
ax.set_xticks([0.5, 1.5, 2.5])
ax.set_xticklabels(model.response.term.levels)
provides such a plot. So what if I would like to get such plot per random effects?
Greetings all,
I've been looking for a way to plot ppc (probabilities) of categorical response per each variable included in the random effects. Indeed, I asked the question in Arviz github (https://github.com/arviz-devs/arviz/issues/2188) however, I couldn't find a solution yet.
So basically, following the code below;
I want to plot the probabilities of each item in the categorical response variable (3 level) per random effect(2 level). I did it before with
rstanarm
andeasystats
onR
for binomial dist, yet got stuck here onPython.
This is from my own study.![image_2023-02-04_040254680](https://user-images.githubusercontent.com/59308858/216736802-9b1f1f97-80f9-4696-976d-427f8bad4c54.png)