cscherrer / Soss.jl

Probabilistic programming via source rewriting
https://cscherrer.github.io/Soss.jl/stable/
MIT License
412 stars 30 forks source link

Request for GLM examples/tutorials #149

Closed DilumAluthge closed 3 years ago

DilumAluthge commented 4 years ago

Generalized linear models

It would be great to have full-worked examples and/or tutorials for Bayesian generalized linear models (GLMs).

I think these would be the most important to start with:

Once we have those, we could then go even further and add some more advanced stuff:

Linear models

For pedagogical purposes, it will probably be useful to also have examples of linear models:

Prior art

For logistic regression, look at:

For multinomial logistic regression, look at:

For linear regression, look at:

cc: @cscherrer cc: @caseykneale

caseykneale commented 4 years ago

I tend to prefer to start from least squares regression because it's how most people begin with learning these fields. Once you get the jist of OLS then LR is easier to grok. But hey - I'm not strong willed though, I think having these kinds of tutorials would be great.

DilumAluthge commented 4 years ago

I tend to prefer to start from least squares regression because it's how most people begin with learning these fields. Once you get the jist of OLS then LR is easier to grok.

Sure, that seems fine to me! As long as we also have the logistic and multinomial logistic examples.

cscherrer commented 4 years ago

Thanks @DilumAluthge and @caseykneale . All of us in the Julia PPL community try to make things easy to use, but it's important for us to account for users with experience in Julia machine learning, but who might be relatively new to Bayesian modeling.

We tend to make these systems to be as powerful and flexible as we can. From an outside perspective, what we end up with might sometimes seem to have too many possible options to know where to start. I'm hoping SossMLJ might come to address this, but it's still in the very early stages.

I understand that frequentist statistics usually starts with unregularized models. You can handle these in PPL, but unless you allow for "improper priors" (as for example in Measures.jl, you'd need something like (for OLS)

m = @model α,β,σ,x begin
    yhat = α .+ β .* x
    y ~ For(length(x)) do j
            Normal(yhat[j], σ)
        end
end

Since any prior distribution would regularize the parameter estimates, we're forced to leave the prior out entirely, and then call to e.g. Optim.jl to find the MLE. From a PPL perspective, this is a very special case. Pedagogically, it might steer people in the wrong direction. So I think this should be included, just maybe not as a first example.

DilumAluthge commented 4 years ago

Pedagogically, it might steer people in the wrong direction.

I agree.

Let us omit the unregularized models. So we would do ridge, lasso, and elastic net for logistic and multinomial logistic.

I've updated my original post to reflect this.

DilumAluthge commented 4 years ago

If we do end up adding examples for linear models, we should make sure to show how to estimate the dispersion (scale) parameter σ².

millerjoey commented 4 years ago

I think it's great to add more examples, especially familiar ones but I've only heard "Bayesian ridge regression" as more of an interpretation instead of a specific method. Estimating the mode from MCMC samples from a posterior is going to be less efficient than optimization, although still interesting as a fun example. I'm not sure if logistic ridge regression has a similar interpretation though.

Sorry in advance for the philosophy :) but regularization per se doesn't fit into Bayesian inference very well, since you're not "fitting" a model. The ideal Bayesian point estimates would be those that minimize the expected loss after doing normal Bayesian inference, rather than optimizing a combined objective during model fitting, which could also make for nice examples.

DilumAluthge commented 4 years ago

Estimating the mode from MCMC samples from a posterior is going to be less efficient than optimization

It would be great to do both, i.e. use Optim.jl to find the MAP, and also use HMC to sample from the posterior distribution.

cscherrer commented 4 years ago

@millerjoey you missed some discussion on the Julia Slack :)

In general, if you take the negative log of a distribution you get a function that can be treated as either a loss or regularization penalty. Gaussian gives squared loss, Laplace gives absolute loss. So if you have a Gaussian for both the prior and the noise and find the MAP estimate, this is equivalent to ridge regression.

There are some complications for this, because we also have uncertainty in the noise scale. We could do something like marginal likelihood for this, but we should probably start with something very simple.

millerjoey commented 4 years ago

Oh yeah cool, I didn't recognize the generality. And haha I did check Zulip.

It would be great to do both, i.e. use Optim.jl to find the MAP, and also use HMC to sample from the posterior distribution.

Gotcha. I guess this is more about Optim + Soss, rather than doing the MCMC inference which I was expecting

cscherrer commented 4 years ago

Haha, yeah this one wasn't Soss-specific: https://julialang.slack.com/archives/CLR9W2MJL/p1590933299497200

DilumAluthge commented 3 years ago

We have implemented linear regression and multinomial logistic regression models in SossMLJ.jl, so I'm going to close this issue. More specific issues can be opened for specific models, specific prior distributions, etc.