I love the xgboost distribution package and what it enables, however when dealing with datasets or trees that do not fit into memory one needs to scale the task using a distributed framework like dask.
Dask already support xgboost natively using the sklearn api, and since xgboost-distribution relies on the original xgboost, I thought it would be quite easy to swap the underlying booster for a distributed one. since the API would be almost identical.
from distributed import LocalCluster, Client
import xgboost as xgb
def main(client: Client) -> None:
X, y = load_data()
regr = xgb.dask.DaskXGBRegressor(n_estimators=100, tree_method="gpu_hist")
regr.client = client # assign the client
regr.fit(X, y, eval_set=[(X, y)])
preds = regr.predict(X)
This problem also pops up when you want to use federated learning, in which case one would like to use a federated booster.
So my question is, would it be possible to swap the underlying xgboost booster in xgboost-distribution for the aforementioned xgb.dask.DaskXGBRegressor?
I love the xgboost distribution package and what it enables, however when dealing with datasets or trees that do not fit into memory one needs to scale the task using a distributed framework like dask.
Dask already support xgboost natively using the sklearn api, and since xgboost-distribution relies on the original xgboost, I thought it would be quite easy to swap the underlying booster for a distributed one. since the API would be almost identical.
This problem also pops up when you want to use federated learning, in which case one would like to use a federated booster.
So my question is, would it be possible to swap the underlying xgboost booster in xgboost-distribution for the aforementioned
xgb.dask.DaskXGBRegressor
?