devindg / pybuc

Bayesian Structural Time Series / Unobserved Components
BSD 3-Clause "New" or "Revised" License
21 stars 5 forks source link

Imposing Lower and Upper bounds on predictors #58

Closed tim-mcwilliams closed 11 months ago

tim-mcwilliams commented 1 year ago

Hi @devindg !!

Working on a project using your package. I need to impose a lower bound on a few of the predictors in the model. For example, if I have 6 predictors in the model (x_train) and I need 3 of those predictors to have a lower bound of 0.00.

Would I need to use the reg_coeff_mean_prior arg to impose this logic?? For example,

model_mean_priors = [
    1.49,
    1.49,
    1.54,
    0.00,
    0.00,
    0.00,
]

bayes_uc = buc.BayesianUnobservedComponents(
    response=y_train,
    predictors=x_train, # three of these need a lower bound of 0.00
    level=True,
    trend=True,
    seed=123
)

post = bayes_uc.sample(
    num_samp=5000,
    reg_coeff_mean_prior=model_mean_priors,
)

Looking forward to your response!!

-Tim

devindg commented 1 year ago

Hi @tim-mcwilliams! I think what you're after is a truncated distribution, such as a half-normal, which is not supported. What you can do, though, is impose "strong" priors on these coefficients. The more observations you have, the stronger your prior will have to be in terms of the mean and variance. For example, suppose you believe that a $1 increase in eggs, all else equal, induces one-dozen less eggs being sold, on average, and theory indicates that the price effect on quantity demanded is negative. In this case, the prior should encode the belief that there is effectively zero probability of a positive coefficient.

The model is q_eggs = b*p_eggs + error, where q_eggs is the number of eggs sold by the dozen, and p_eggs is the price for a dozen eggs. You could place a strong prior on 'b', where again the strength of the prior is relative to the likelihood function, such as 'b' ~ N(-1, 0.1^2). This prior encodes that the belief that the effect of a $1 increase for a dozen eggs is between [-1.2, -0.8] with approximately 95% probability.

If you have a lot of observations/degrees of freedom, your prior will get overwhelmed by the likelihood function. So if the data alone in this example indicate a positive price effect, you will probably have to take a trial-and-error approach to get a result that makes sense to you, even if this doesn't strictly follow the Bayesian paradigm. The trial-and-error part may be as simple as tightening the variance (e.g., from 0.1^2 to 0.01^2 in the above example). Or maybe the model is strongly misspecified (e.g., omitted variable bias, endogeneity, etc.), which could have a material impact on your coefficients.

Here is what the code could look like for the example above:

cov_prior = [[0.1**2]]
prec_prior = 1 / cov_prior # matrix inversion is required for more than one coefficient when cov_prior -> prec_prior
mean_prior = [-1.]
mod = BayesianUnobservedComponents(response=q_eggs, predictors=p_eggs,  level=True, trend=True)
mod.sample(5000, reg_coeff_mean_prior=mean_prior, reg_coeff_prec_prior=prec_prior)

The main thing to take away from this is that in addition to setting a mean prior, you should also set a variance/precision prior for the coefficients. A mean prior by itself will get quickly diluted by the likelihood since the default variance/precision prior for coefficients is vague.

tim-mcwilliams commented 12 months ago

Yea I am after a truncated distribution here. Thanks so much for the detailed example! So if I am understanding the syntax correctly, for a multivariate model the cov_prior & mean_prior would look like this,

cov_prior = [[
    0.1**2, 
    0.1**2, 
    0.1**2, 
]]

mean_prior = [
    -1,
    -1,
    -1,
]

Is this logic correct??

devindg commented 12 months ago

Let's take your first example. Here's a way to modify it.

model_mean_priors = [
    1.49,
    1.49,
    1.54,
    1.00,
    1.00,
    1.00,
]

model_var_priors = [
0.5**2, 
1**2, 
0.25**2, 
0.1**2, 
0.1**2, 
0.1**2
]

num_coeff = len(model_mean_priors)
# Construct covariance matrix from variance priors. This is a diagonal matrix.
model_cov_prior = np.diag(model_var_priors)
# Get inverse of covariance matrix. Since this is a diagonal matrix, the inverse
# of the covariance matrix is as simple as inverting each element along the diagonal. 
# In general, though, using something like np.lingalg.solve for matrix inversion
# is my recommendation.
model_prec_prior = np.linalg.solve(model_cov_prior, np.eye(num_coeff))

bayes_uc = buc.BayesianUnobservedComponents(
    response=y_train,
    predictors=x_train, # three of these need a lower bound of 0.00
    level=True,
    trend=True,
    seed=123
)

post = bayes_uc.sample(
    num_samp=5000,
    reg_coeff_mean_prior=model_mean_priors,
    reg_coeff_prec_prior=model_prec_prior
)

Because your last three coefficients are expected to be positive, your mean prior for these coefficients shouldn't be 0. Instead, you'll need to establish positive mean priors for these coefficients along with sufficiently tight variance priors to minimize the possibility of getting negative coefficients.

Note that the variance priors I posted are merely for sake of example. You should choose these values based on your domain knowledge. Very high variance priors for your coefficients amount to vague priors, in which case you'll get close to the maximum likelihood solution.

tim-mcwilliams commented 11 months ago

@devindg , thanks for that detailed example! How you construct the covariance matrix and then get the inverse in this context is something new to me. It's much easier to create this type of lower and upper bounds in a bayesian linear regression setting. So, appreciate the guidance here.