SuryaThiru / hierarchical-model-blog

Source code for the experiments in my blog on Bayesian hierarchical modeling
https://towardsdatascience.com/introduction-to-hierarchical-modeling-a5c7b2ebb1ca
10 stars 4 forks source link

Graph for multi-dimensional data #1

Open mahideel opened 3 years ago

mahideel commented 3 years ago

Hi,

Thanks so much for your blog, I find it extremely useful - especially since I am new to Python and Bayesian.

I have a question, if you don't mind, how would you plot facetgrid(posterior_plot) - i.e. the last plot on your blog/tutorial if you have 2 independent variable. So instead of only homework in your example, we'll have homework and ses.

I have similar data, but my model has 2 independent variables. My model works but I have not been able to do comparison plot (facetgrid(posterior_plot)) to show the difference in the slope between unpooled and pooled models.

Your advice is greatly appreciated.

Thank you.

SuryaThiru commented 3 years ago

Hey @mahideel ! Glad to know you know found the blog useful. Off the top of my head, I think it'd be easy to visualise them by plotting 2 separate facet plots for each independent variable.

There's always the option of using 3D plots. Would that work in your case?

mahideel commented 3 years ago

HI @SuryaThiru

Thanks so much for your reply. I tried separating them. My problem was how to grab each variable from the trace?

My trace looks like this: beta[0,0,0] | -0.034 | 3.252 | -5.226 | 5.035 beta[0,1,0] | 0.202 | 3.667 | -5.699 | 6.261 beta[1,0,0] | -0.003 | 0.075 | -0.139 | 0.139 beta[1,1,0] | 0.002 | 0.037 | -0.065 | 0.071 beta[2,0,0] | 0.064 | 3.189 | -5.797 | 5.993 beta[2,1,0] | 0.09 | 3.204 | -5.361 | 5.563 beta[3,0,0] | 0.134 | 3.734 | -5.924 | 5.539 beta[3,1,0] | 0.134 | 3.472 | -6.1 | 5.896 beta[4,0,0] | 0.055 | 3.151 | -5.633 | 5.155 beta[4,1,0] | -0.061 | 3.004 | -4.612 | 6.387 ......

I don't know how to grab the relevant patient & predictor from the hierarchical model trace to plot them and fit the abline.

Any advice is much appreciated.

I have 2 predictors and 9 people, where each person have multiple timepoints ranging from 3 to 10.

with pm.Model() as hierarchical_model1:

# Hyperpriors
mu_a = pm.Normal('mu_a', mu=0, sigma=10)
sigma_a = pm.HalfNormal('sigma_a', sigma=10)
mu_b = pm.Normal('mu_b', mu=0, sigma=10)
sigma_b = pm.HalfNormal('sigma_b', sigma=10)   

# Intercept for each patient, distributed around group mean mu_a
a = pm.Normal('a', mu=mu_a, sd=sigma_a, shape=(n_pat, n_pred, 1))
# Slope for each patient, distributed around group mean mu_b
b = pm.Normal('b', mu=mu_b, sd=sigma_b, shape=(n_pat, n_pred, 1))

# Model error
eps = pm.HalfCauchy('eps', beta=5)

# Expected value
omega = (pm.math.dot(std_glyc, b))[pat_id,1]
tpoint_est = a[pat_id,1] + omega

# Data likelihood
y_like = pm.Normal('y_like', mu=tpoint_est, sigma=eps, observed=tpoint)

with hierarchical_model1: hier_trace = pm.sample(draws = 4000, tune = 1000, chains = 2) hier_out = pm.summary(hier_trace)

I am so new to python. Your help is greatly appreciated.

Thank you.

SuryaThiru commented 3 years ago

Hey! To test your setting, I used homework and ses variable for the same model:

hw_ses = data[['homework', 'ses']]

And the model code:

with pm.Model() as model:
    # Hyperpriors
    mu_a = pm.Normal('mu_a', mu=40, sigma=50)
    sigma_a = pm.HalfNormal('sigma_a', 50)

    mu_b = pm.Normal('mu_b', mu=0, sigma=10, shape=2)
    sigma_b = pm.HalfNormal('sigma_b', 5)

    # Intercept
    a = pm.Normal('a', mu=mu_a, sigma=sigma_a, shape=n_schools)
    # Slope
    b = pm.Normal('b', mu=mu_b, sigma=sigma_b, shape=(n_schools, 2))

    # Model error
    eps = pm.HalfCauchy('eps', 5)

    y_hat = a[school] + pm.math.dot(b[school], hw_ses.T)

    # Likelihood
    y_like = pm.Normal('y_like', mu=y_hat, sigma=eps, observed=math)

Note:

The model graph (excluding hyperpriors): image

This gives me my posterior dimension in the trace as:

>>> trace['b'].shape
(4000, 10, 2)

where 4000 is our n_samples, 10 is our group (n_schools), and 2 our independent variable (homework, ses). So you can access the coefficient distribution of homework with trace['b'][:,:,0] and ses with trace['b'][:,:,1].

With these 2 values out, you can create plots for each variable. You will have to change the various plotting functions till you get it right.

All the best.

mahideel commented 3 years ago

Thank you so much!! I will give this ago and will let you know how it goes.

I really appreciate your help.