Quantco / glum

High performance Python GLMs with all the features!
https://glum.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
308 stars 24 forks source link

ElasticNet in Glum 2.6.0 is twice as slow as Scikit-learn 1.3.2 #736

Closed carlosg-m closed 10 months ago

carlosg-m commented 10 months ago

Results plot image

Gloom 2.6.0 image image

Scikit-learn 1.3.2 image image

lbittarello commented 10 months ago

Hi @carlosg-m! Thank you for the issue. Would it be possible to provide more information about your use case? For example, number of rows, number of columns, data types, etc. A minimal reproducible example would be ideal.

carlosg-m commented 10 months ago

Hi @lbittarello,

I've created a Colab with the benchmark using a public dataset. I believe the difference comes from the parameter precompute=True. Sklearn uses precomputed Gram matrix to speed up calculations. Feel free to explore.

https://colab.research.google.com/drive/13seZR8ttSuNyLM39U8KkQd0IR8OUxmdG?usp=sharing

Badr-MOUFAD commented 10 months ago

Hello everyone,

The performance disparity is due to the usage of Gram updates to keep the gradient synchronized with the iterates. It is powerful in setups with very few features compared to samples.

Actually, there is an alternative to scikit-learn more tailored to sparse GLMs and drop-in replacement for its estimators. It gives more flexibility for combining solvers, datafits, and penalties.

Here is the performance report on the shared benchmark

Also, the code snippet

from skglm.solvers import GramCD
from skglm.datafits import Quadratic
from skglm.penalties import L1_plus_L2
from skglm.estimators import GeneralizedLinearEstimator

model = GeneralizedLinearEstimator(
    datafit=Quadratic(),
    penalty=L1_plus_L2(alpha=0.001, l1_ratio=0.5),
    solver=GramCD(max_iter=1000, fit_intercept=False)
)

model.fit(X, y)

Pining @mathurinm

carlosg-m commented 10 months ago

@Badr-MOUFAD, thank you for the extra information. I've tried fitting skglm to a couple of timeseries but the benefit did not justify altering the model since we're close to deploy the project into production.

But I'm going to keep the library in mind for the next phases of this particular project. The initiative and motivation is fantastic and we need more libraries like glum or skglm to close the GLM gap between Python and R.

If you want more information about what we are doing there is a recap in this github issue https://github.com/scikit-learn/scikit-learn/discussions/27886.

MatthiasSchmidtblaicherQC commented 10 months ago

Thanks for the Colab @carlosg-m. As you guessed, the performance difference between glum and sklearn comes indeed from the precompute argument. In the example, I get: ~11s with ElasticNet and precompute=True. ~10s with skglm (thanks for the pointer @Badr-MOUFAD). ~30s with glum. ~50s with ElasticNet and precompute=False. See a more stripped down example below, the ranking seemed stable in the parameter configurations that I tried.

As some background, D in X'D X in GLM fitting generally depends on the inverse derivative of the linear prediction and must be recomputed at each iteration of IRLS. In the case of a (homoscedastic) normal distribution and identity link, D is just the identity, so A can be precomputed as X'X. Unlike ElasticNet, glum has not been optimized for this case and, if you want maximum performance, you are indeed better off using a library that does precompute A. We always welcome contributions to glum, in case that you want to add the precompute option.

Minimal example:

# %%
import numpy as np
import pandas as pd
import skglm

from sklearn.linear_model import ElasticNet
from glum import GeneralizedLinearRegressor
from timeit import timeit

# %%
n = 10_000
k = 10
rng = np.random.default_rng(123)
# %%
beta = rng.normal(size=k)
X = pd.DataFrame({f"x{i}": rng.normal(size=n) for i in range(k)})
y = X @ beta + rng.normal(size=n)
# %%
alpha = 0.01
l1_ratio = 0
# %%

model_sklearn = ElasticNet(
    alpha=alpha,
    l1_ratio=l1_ratio,
    precompute=True,
    max_iter=1000,
    copy_X=False,
    selection="cyclic",
    fit_intercept=False,
)

model_glum = GeneralizedLinearRegressor(
    alpha=alpha,
    l1_ratio=l1_ratio,
    max_iter=1000,
    copy_X=False,
    selection="cyclic",
    fit_intercept=False,
    family="normal",
    solver="irls-cd",
    link="identity",
)

model_sklearn_no_precompute = ElasticNet(
    alpha=alpha,
    l1_ratio=l1_ratio,
    precompute=False,
    max_iter=1000,
    copy_X=False,
    selection="cyclic",
    fit_intercept=False,
)

model_skglm = GeneralizedLinearEstimator(
    datafit=skglm.datafits.Quadratic(),
    penalty=skglm.penalties.L1_plus_L2(alpha=0.001, l1_ratio=0.5),
    solver=skglm.solvers.GramCD(max_iter=1000, fit_intercept=False)
)
# %%
%timeit model_sklearn.fit(X=X, y=y)
# %%
%timeit model_glum.fit(X=X, y=y)
# %%
%timeit model_sklearn_no_precompute.fit(X=X, y=y)
# %%
%timeit model_skglm.fit(X=X, y=y)
# %%