TuringLang / TuringGLM.jl

Bayesian Generalized Linear models using `@formula` syntax.
https://turinglang.org/TuringGLM.jl/dev
MIT License
70 stars 7 forks source link

Clarification on `Logistic` (and possibly rename?) #44

Closed ParadaCarleton closed 1 year ago

ParadaCarleton commented 2 years ago
For likelihoods, TuringGLM.jl supports:

Gaussian() (the default if not specified): linear regression
Student(): robust linear regression
**Logistic(): logistic regression**
Pois(): Poisson count data regression
NegBin(): negative binomial robust count data regression

I'm marking this because it seems to be implying two different things, and I'm not sure which it's referring to. "Logistic regression" almost always means regression with a Binomial likelihood using the logit-link (logistic inverse link). However, the Logistic distribution also exists, and can be used to perform robust linear regression. (It has slightly thicker tails than a normal distribution, but unlike the T distribution's they scale off exponentially, making it a good efficiency/robustness compromise). If these names are supposed to refer to likelihoods, Logistic would then be inappropriate and this name could result in misunderstandings.

Perhaps we should make a clear distinction between a likelihood and the link function associated with it? There's no reason you can't use a logistic link with a normal likelihood, for example.

storopoli commented 2 years ago

Yes, this is a hack. I've tried with a Bernoulli, but somehow I could not because of name clashes with Distributions, DistributionsAD and Turing, which all are re-exported in the namespace by Turing.jl re-export.

If you could find how to circumvent this I would love to know.

ParadaCarleton commented 2 years ago

Yes, this is a hack. I've tried with a Bernoulli, but somehow I could not because of name clashes with Distributions, DistributionsAD and Turing, which all are re-exported in the namespace by Turing.jl re-export.

If you could find how to circumvent this I would love to know.

Could give them all names like BernoulliLike, TDistLike, etc. to have the same names without clashing with Distributions. Although I would like a way to be able to use an arbitrary likelihood.

Perhaps we could use the type itself, e.g. turing_model(@formula(y~x), Bernoulli)?

storopoli commented 2 years ago

The *Like is a viable way. I tried using the type itself and had a nasty bug

ParadaCarleton commented 2 years ago

The *Like is a viable way. I tried using the type itself and had a nasty bug

Hmm, what happened? Maybe @devmotion will have some idea (since he's more familiar with Dists.jl)?

devmotion commented 2 years ago

Yes, this is a hack. I've tried with a Bernoulli, but somehow I could not because of name clashes with Distributions, DistributionsAD and Turing, which all are re-exported in the namespace by Turing.jl re-export.

Which name clashes exactly? These packages should play together nicely since they are designed to do so. Reexporting should also not introduce any name clashes.

ParadaCarleton commented 2 years ago

Yes, this is a hack. I've tried with a Bernoulli, but somehow I could not because of name clashes with Distributions, DistributionsAD and Turing, which all are re-exported in the namespace by Turing.jl re-export.

Which name clashes exactly? These packages should play together nicely since they are designed to do so. Reexporting should also not introduce any name clashes.

I assume he just means we can't write, say: turing_model(@formula(y~x), Bernoulli()) With a new struct Bernoulli, since that name's already taken by Distributions.jl.

devmotion commented 2 years ago

Is a new struct needed? Could you just use Distributions.Bernoulli? Or the link functions in GLM?

ParadaCarleton commented 2 years ago

Is a new struct needed? Could you just use Distributions.Bernoulli? Or the link functions in GLM?

I would expect you could use Distributions.Bernoulli by passing the type itself, but @storopoli said that caused bugs?

And using Bernoulli(p) directly doesn’t work, since the parameter is exactly what we want to estimate.

ParadaCarleton commented 2 years ago

You might want to take a look at the fit function in Dists.jl: https://github.com/JuliaStats/Distributions.jl/blob/71f1b1e39ad2b66b4865b5e1fd537315c8a53ae8/src/genericfit.jl#L8-L15

Which works with distribution types directly.

storopoli commented 2 years ago

With a new struct Bernoulli, since that name's already taken by Distributions.jl.

Yes that was the issue.

devmotion commented 2 years ago

Yeah but why can't you use the type Distributions.Bernoulli instead of an instance of it? That's more natural as @ParadaCarleton also said above.

ParadaCarleton commented 2 years ago

It should also be more generalizable — it would be super useful if you could pass an arbitrary likelihood from Dists.jl.

ParadaCarleton commented 2 years ago

@storopoli does TuringGLM currently work by writing out the most common GLMs one at a time? In theory, you should be able to work with any likelihood, including ones specified by the user, by converting anything of the form y ~ x (with x = [x_1, x_2,...] a vector of features) into:

β ~ Prior()
y .~ Likelihood(InvLink(β ⋅ x))

Showing off how a general approach can work with an unusual likelihood, e.g. a Gumbel for predicting extreme values, would be very cool!

storopoli commented 2 years ago

The likelihood API would need a rewrite. I haven't touched anything InvLink related.

ParadaCarleton commented 2 years ago

The likelihood API would need a rewrite. I haven't touched anything InvLink related.

Sorry, can you clarify what you mean by this?

storopoli commented 2 years ago

Check https://github.com/TuringLang/TuringGLM.jl/blob/main/src/model.jl. There is no multiple dispatch on any Distributions.jl type like Bernoulli, InvLink.

So the likelihood API would need a rewrite. I am focusing now on some tutorials marked with the tag tutorials. So any PR would be most welcome.

storopoli commented 1 year ago

now we use model=Bernoulli.