JuliaStats / GLM.jl

Generalized linear models in Julia
Other
592 stars 114 forks source link

Function to return a distribution over response for a point input #305

Open oxinabox opened 5 years ago

oxinabox commented 5 years ago

Was talking to @kleinschmidt who explained to me that for certain types of GLMs it was possible to calculate the distribution of the response for a given point input.

That would be really cool, even if it only works for some Link Functions.

Lyndon White [2:20 PM]

I expected GLMs to give me back distributions, but I guess they can’t do that? Or am I just not calling the right functions

Dave F Kleinschmidt [2:20 PM]

hmmmmmmm no I don't think that's implemented it shouldn't be hard at least for the standard ones

Lyndon White [2:20 PM]

In general that is something they do? It would be mad useful to me.

Dave F Kleinschmidt [2:20 PM]

uhhhhhh maybe well here's the issue: which distribution? let's take a linear model the residual errors are presumed to be normal so you fit a variance parameter great, then your predictions for an x value should be that x vector dotted with the coefficients, plus some normally distributed noise with the estimated residual variance, right?

Lyndon White [2:22 PM]

Right

Dave F Kleinschmidt [2:22 PM]

well, what about your uncertainty in the coefficients? do you take that into account?

if you're in linear model land that's easy, it's just another normal distirbution you convolve with the residual error but if you're in, let's say, logistic land, then now you're talking about normally distributed uncertainty in log-odds space, that then gets converted through to probability with the logit function and then interpreted as a coin flip for the error model mayyybe you can do that analytically but I don't want to try so thats why it's complicated you can easily get a distribution for the point estimate, but it might not be the right or most informative one

nalimilan commented 5 years ago

This sounds very speculative to me. Maybe ask on a general stats forum first, or look at possible implementation in other software?

palday commented 4 years ago

I think the relevant terms in general statistics are "prediction intervals" instead of "confidence intervals" (which GLM already provides). Returning a proper distribution isn't exactly a thing in the frequentist world, but the prediction interval does provide roughly what you're looking for.