Quantco / glum

High performance Python GLMs with all the features!
https://glum.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
292 stars 23 forks source link

Add a comparison with GAM tools to the docs? #471

Open tbenthompson opened 2 years ago

tbenthompson commented 2 years ago

I've gotten a few different questions about how glum compares to using something like pygam or whether we have plans to support GAMs:

Strictly speaking, GLMs are a subset of GAMs so this question seems very appropriate. Looking over pygam, for example, a few things seem missing/different:

It would be interesting to do a quick benchmark or even add pygam to the benchmark suite to also have a performance comparison for glum vs pygam. My guess is that glum for GLMs is substantially faster because it's tailored specifically to the problem and also handles the sparsity issues well.

A final question is how much of the feature set (e.g. elastic net regularization along a path) could be ported to the gam setting and whether some of the work we did here could be extended provide a basis for a GAM library.

lbittarello commented 2 years ago

A final question is [...] whether some of the work we did here could be extended provide a basis for a GAM library

There are different ways to fit GAMs. As far as I understand, pygam is basically using splines to approximate the unknown relationships between outcomes and continuous regressors. For any given sample size, this approach is effectively identical to a GLM with feature engineering (i.e. glum + sklearn). Indeed, it only qualifies as a GAM if the complexity of the splines increases in some automated fashion with the sample size. The difference is conceptual (parametric vs semiparametric) rather than practical. One could also fit GAMs with local regressions or something fancier, in which case the equivalence with GLMs breaks down.

mattmills49 commented 8 months ago

If this group is interested in including smoothing spline functionality in glum I'd be happy to help out. I put together a guide on how to fit penalized splines using your own custom penalty matrix here: http://statmills.com/2023-11-20-Penalized_Splines_Using_glum/ .

While you can theoretically do GAMs in any way with your own penalty matrix I think incorporating separate penalties per smooth term and interaction splines into glum would make it way easier for users to actually use this functionality. There is obviously more considerations than just mine but I thought I'd offer to help if there is an appetite for more here.

MatthiasSchmidtblaicherQC commented 5 months ago

Thanks for the cool tutorial and for the offer to help out here. With the upcoming release of glum v3, there will be two main user interfaces:

  1. The current one in which the user passes a design matrix that supposedly comes from preprocessing some dataframe, and
  2. a formula interface building on formulaic, which does preprocessing such as creating B-splines "under the hood".

With 1., I find the approach in the tutorial quite natural: given that the user already built a model matrix, she can also specify a custom penalty matrix. Here, I see scope for convenience helpers for creating custom penalty matrices though. With 2., it seems more natural to create penalty matrices inside the model, so smoothing or cyclic constraints that work within the formula interface would be welcome contributions!