Graph for multi-dimensional data

mahideel commented 3 years ago

Hi,

Thanks so much for your blog, I find it extremely useful - especially since I am new to Python and Bayesian.

I have a question, if you don't mind, how would you plot facetgrid(posterior_plot) - i.e. the last plot on your blog/tutorial if you have 2 independent variable. So instead of only homework in your example, we'll have homework and ses.

I have similar data, but my model has 2 independent variables. My model works but I have not been able to do comparison plot (facetgrid(posterior_plot)) to show the difference in the slope between unpooled and pooled models.

Your advice is greatly appreciated.

Thank you.

SuryaThiru commented 3 years ago

Hey @mahideel ! Glad to know you know found the blog useful. Off the top of my head, I think it'd be easy to visualise them by plotting 2 separate facet plots for each independent variable.

There's always the option of using 3D plots. Would that work in your case?

mahideel commented 3 years ago

HI @SuryaThiru

Thanks so much for your reply. I tried separating them. My problem was how to grab each variable from the trace?

My trace looks like this: beta[0,0,0] | -0.034 | 3.252 | -5.226 | 5.035 beta[0,1,0] | 0.202 | 3.667 | -5.699 | 6.261 beta[1,0,0] | -0.003 | 0.075 | -0.139 | 0.139 beta[1,1,0] | 0.002 | 0.037 | -0.065 | 0.071 beta[2,0,0] | 0.064 | 3.189 | -5.797 | 5.993 beta[2,1,0] | 0.09 | 3.204 | -5.361 | 5.563 beta[3,0,0] | 0.134 | 3.734 | -5.924 | 5.539 beta[3,1,0] | 0.134 | 3.472 | -6.1 | 5.896 beta[4,0,0] | 0.055 | 3.151 | -5.633 | 5.155 beta[4,1,0] | -0.061 | 3.004 | -4.612 | 6.387 ......

I don't know how to grab the relevant patient & predictor from the hierarchical model trace to plot them and fit the abline.

Any advice is much appreciated.

I have 2 predictors and 9 people, where each person have multiple timepoints ranging from 3 to 10.

with pm.Model() as hierarchical_model1:

# Hyperpriors
mu_a = pm.Normal('mu_a', mu=0, sigma=10)
sigma_a = pm.HalfNormal('sigma_a', sigma=10)
mu_b = pm.Normal('mu_b', mu=0, sigma=10)
sigma_b = pm.HalfNormal('sigma_b', sigma=10)   

# Intercept for each patient, distributed around group mean mu_a
a = pm.Normal('a', mu=mu_a, sd=sigma_a, shape=(n_pat, n_pred, 1))
# Slope for each patient, distributed around group mean mu_b
b = pm.Normal('b', mu=mu_b, sd=sigma_b, shape=(n_pat, n_pred, 1))

# Model error
eps = pm.HalfCauchy('eps', beta=5)

# Expected value
omega = (pm.math.dot(std_glyc, b))[pat_id,1]
tpoint_est = a[pat_id,1] + omega

# Data likelihood
y_like = pm.Normal('y_like', mu=tpoint_est, sigma=eps, observed=tpoint)

with hierarchical_model1: hier_trace = pm.sample(draws = 4000, tune = 1000, chains = 2) hier_out = pm.summary(hier_trace)

I am so new to python. Your help is greatly appreciated.

Thank you.

SuryaThiru commented 3 years ago

Hey! To test your setting, I used homework and ses variable for the same model:

hw_ses = data[['homework', 'ses']]

And the model code:

with pm.Model() as model:
    # Hyperpriors
    mu_a = pm.Normal('mu_a', mu=40, sigma=50)
    sigma_a = pm.HalfNormal('sigma_a', 50)

    mu_b = pm.Normal('mu_b', mu=0, sigma=10, shape=2)
    sigma_b = pm.HalfNormal('sigma_b', 5)

    # Intercept
    a = pm.Normal('a', mu=mu_a, sigma=sigma_a, shape=n_schools)
    # Slope
    b = pm.Normal('b', mu=mu_b, sigma=sigma_b, shape=(n_schools, 2))

    # Model error
    eps = pm.HalfCauchy('eps', 5)

    y_hat = a[school] + pm.math.dot(b[school], hw_ses.T)

    # Likelihood
    y_like = pm.Normal('y_like', mu=y_hat, sigma=eps, observed=math)

Note:

You should use a single intercept dimension (for the schools), and multiple for slopes as they correspond to 2 input variables (homework, ses).
I am using 2 different means for b by setting shape=2 when creating mu_b with Normal. This could give a bit more freedom to the model's search.
The model shown above might not be the best model for your case. You will have to do appropriate prior and posterior checks to find the right model.

The model graph (excluding hyperpriors):

This gives me my posterior dimension in the trace as:

>>> trace['b'].shape
(4000, 10, 2)

where 4000 is our n_samples, 10 is our group (n_schools), and 2 our independent variable (homework, ses). So you can access the coefficient distribution of homework with trace['b'][:,:,0] and ses with trace['b'][:,:,1].

With these 2 values out, you can create plots for each variable. You will have to change the various plotting functions till you get it right.

All the best.

mahideel commented 3 years ago

Thank you so much!! I will give this ago and will let you know how it goes.

I really appreciate your help.

SuryaThiru / hierarchical-model-blog

Graph for multi-dimensional data #1