dask / dask-ml

Scalable Machine Learning with Dask
http://ml.dask.org
BSD 3-Clause "New" or "Revised" License
885 stars 254 forks source link

LinearRegression doesn't return lazy object #949

Open AlexeyPechnikov opened 1 year ago

AlexeyPechnikov commented 1 year ago

LinearRegression requires even longer time than sklearn version and it doesn't return a lazy object:

%%time

from sklearn.pipeline import make_pipeline
from dask_ml.linear_model import LinearRegression

size = 1e6

X = dask.array.arange(2*size).reshape(-1,2)
y = dask.array.arange(size).reshape(-1,1)
reg = LinearRegression()
reg.fit(X, y)
image

It looks as a Dask-incompatible function.

TomAugspurger commented 1 year ago

No comment on the performance, but all .fit methods in dask-ml are eager.

On Tue, Nov 1, 2022 at 2:26 PM Alexey Pechnikov @.***> wrote:

LinearRegression requires even longer time than sklearn version and it doesn't return a lazy object:

%%time

from sklearn.pipeline import make_pipeline from dask_ml.linear_model import LinearRegression

size = 1e6

X = dask.array.arange(2*size).reshape(-1,2) y = dask.array.arange(size).reshape(-1,1) reg = LinearRegression() reg.fit(X, y)

[image: image] https://user-images.githubusercontent.com/7342379/199320461-14182b29-9156-4875-b678-e3cdec294976.png

It looks as a Dask-incompatible function.

— Reply to this email directly, view it on GitHub https://github.com/dask/dask-ml/issues/949, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAOITZABPTTCXRS5SQULLWGFVHVANCNFSM6AAAAAARULWTBE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

AlexeyPechnikov commented 1 year ago

@TomAugspurger Do you mean dask-ml has no any advantages and it’s slower vs sklearn? Obviously, we can’t select and process just a subset of data later when dask-ml methods are not lazy.