[Question] Why PIRLS? - Githubissues

@lucasfariaslf It turns out that IRLS is equivalent to Newton-Raphson (when the GLM/GAM uses the canonical link function). That is that we can naturally interpret IRLS as Newtown-Raphon applied to maximum likelihood estimation of GLMs.

The upside of IRLS (and any modification thereof like penalized-IRLS) is that it allows us to use theory from linear regression to perform model diagnostics on the final GAM (eg estimated degrees of freedom, confidence intervals, p-values).

The 2 main downsides with IRLS that I see are:

it is computationally quite expensive since we need to form and invert the hessian.
we can only use distributions from the exponential family (for example, we can't do quantile regression since we cannot use the Laplace distribution)

You're right: it would be great to allow alternate solvers for this library, like SGD with implicit second derivative (adam), or l-BFGS, since these would allow us to scale to larger datasets and use arbitrary error distributions, possibly at the expense of doing simple model diagnostics.

What do you think?

dswah / pyGAM

[Question] Why PIRLS? #227