Closed carlosg-m closed 10 months ago
Hi @carlosg-m! Thank you for the issue. Would it be possible to provide more information about your use case? For example, number of rows, number of columns, data types, etc. A minimal reproducible example would be ideal.
Hi @lbittarello,
I've created a Colab with the benchmark using a public dataset.
I believe the difference comes from the parameter precompute=True
.
Sklearn uses precomputed Gram matrix to speed up calculations
.
Feel free to explore.
https://colab.research.google.com/drive/13seZR8ttSuNyLM39U8KkQd0IR8OUxmdG?usp=sharing
Hello everyone,
The performance disparity is due to the usage of Gram updates to keep the gradient synchronized with the iterates. It is powerful in setups with very few features compared to samples.
Actually, there is an alternative to scikit-learn
more tailored to sparse GLMs and drop-in replacement for its estimators. It gives more flexibility for combining solvers, datafits, and penalties.
Here is the performance report on the shared benchmark
Also, the code snippet
from skglm.solvers import GramCD
from skglm.datafits import Quadratic
from skglm.penalties import L1_plus_L2
from skglm.estimators import GeneralizedLinearEstimator
model = GeneralizedLinearEstimator(
datafit=Quadratic(),
penalty=L1_plus_L2(alpha=0.001, l1_ratio=0.5),
solver=GramCD(max_iter=1000, fit_intercept=False)
)
model.fit(X, y)
Pining @mathurinm
@Badr-MOUFAD, thank you for the extra information. I've tried fitting skglm to a couple of timeseries but the benefit did not justify altering the model since we're close to deploy the project into production.
But I'm going to keep the library in mind for the next phases of this particular project. The initiative and motivation is fantastic and we need more libraries like glum
or skglm
to close the GLM gap between Python and R.
If you want more information about what we are doing there is a recap in this github issue https://github.com/scikit-learn/scikit-learn/discussions/27886.
Thanks for the Colab @carlosg-m. As you guessed, the performance difference between glum and sklearn comes indeed from the precompute
argument. In the example, I get:
~11s with ElasticNet and precompute=True
.
~10s with skglm (thanks for the pointer @Badr-MOUFAD).
~30s with glum.
~50s with ElasticNet and precompute=False
.
See a more stripped down example below, the ranking seemed stable in the parameter configurations that I tried.
As some background, D
in X'D X
in GLM fitting generally depends on the inverse derivative of the linear prediction and must be recomputed at each iteration of IRLS. In the case of a (homoscedastic) normal distribution and identity link, D
is just the identity, so A
can be precomputed as X'X
. Unlike ElasticNet
, glum
has not been optimized for this case and, if you want maximum performance, you are indeed better off using a library that does precompute A
. We always welcome contributions to glum, in case that you want to add the precompute option.
Minimal example:
# %%
import numpy as np
import pandas as pd
import skglm
from sklearn.linear_model import ElasticNet
from glum import GeneralizedLinearRegressor
from timeit import timeit
# %%
n = 10_000
k = 10
rng = np.random.default_rng(123)
# %%
beta = rng.normal(size=k)
X = pd.DataFrame({f"x{i}": rng.normal(size=n) for i in range(k)})
y = X @ beta + rng.normal(size=n)
# %%
alpha = 0.01
l1_ratio = 0
# %%
model_sklearn = ElasticNet(
alpha=alpha,
l1_ratio=l1_ratio,
precompute=True,
max_iter=1000,
copy_X=False,
selection="cyclic",
fit_intercept=False,
)
model_glum = GeneralizedLinearRegressor(
alpha=alpha,
l1_ratio=l1_ratio,
max_iter=1000,
copy_X=False,
selection="cyclic",
fit_intercept=False,
family="normal",
solver="irls-cd",
link="identity",
)
model_sklearn_no_precompute = ElasticNet(
alpha=alpha,
l1_ratio=l1_ratio,
precompute=False,
max_iter=1000,
copy_X=False,
selection="cyclic",
fit_intercept=False,
)
model_skglm = GeneralizedLinearEstimator(
datafit=skglm.datafits.Quadratic(),
penalty=skglm.penalties.L1_plus_L2(alpha=0.001, l1_ratio=0.5),
solver=skglm.solvers.GramCD(max_iter=1000, fit_intercept=False)
)
# %%
%timeit model_sklearn.fit(X=X, y=y)
# %%
%timeit model_glum.fit(X=X, y=y)
# %%
%timeit model_sklearn_no_precompute.fit(X=X, y=y)
# %%
%timeit model_skglm.fit(X=X, y=y)
# %%
Results plot
Gloom 2.6.0
Scikit-learn 1.3.2