Ibotta / sk-dist

Distributed scikit-learn meta-estimators in PySpark
Apache License 2.0
285 stars 51 forks source link

cannot get answer with LGBMClassifier #25

Closed jiahengqi closed 4 years ago

jiahengqi commented 4 years ago

I change xgb to lgb but can't get any return

it cost 1sec on GridSearchCV

grid=dict(num_leaves=[8,15,31],
     n_estimators=[100, 200, 300])
for _ in trange(1):
    model_lgb = GridSearchCV(
        LGBMClassifier(),
        grid, n_jobs=4, cv=3
        )
    model_lgb.fit(X,y)

but no return in 10 min with DistGridSearchCV

grid=dict(num_leaves=[8,15,31],
     n_estimators=[100, 200, 300],
         n_jobs=1)
for _ in trange(1):
    model_lgb = DistGridSearchCV(
        LGBMClassifier(),
        grid, sc, cv=3,n_jobs=1
        )
    model_lgb.fit(X,y)
denver1117 commented 4 years ago

Do you have LightGBM installed on all of the nodes of the cluster? Including the required bindings (https://pypi.org/project/glibc/)? This will all need to be installed using a node bootstrap.

We've never tested LightGBM with sk-dist. It could work in theory but sk-dist doesn't formally support it.

denver1117 commented 4 years ago

We've added to the documentation around LightGBM: https://github.com/Ibotta/sk-dist#gradient-boosting