Open xiaozhongtian opened 5 years ago
Right, Dask-ML doesn't have any distributed tree-based estimators at the moment.
https://github.com/dask/dask-ml/issues/299 may be interesting. Scikit-Learn now has expanded Olivier's prototype to https://scikit-learn.org/dev/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html#sklearn.ensemble.GradientBoostingClassifier.
OK. Maybe for now I will find something in dask.xgboost and dask.lightgbm instead.
Collecting links from https://github.com/dask/dask-ml/issues/299
http://papers.nips.cc/paper/6380-a-communication-efficient-parallel-algorithm-for-decision-tree describes a basic algorithm for distributed gradient boosting, and then a more efficient, but much more complicated algorithm.
cc @nicolashug. It seems like we won't be able to reuse much or any of the scikit-learn implementation if we wanted a distributed implementation.
Hello, I'm doing a project that needs to use dask-ml library to treat the large dataset. I didn't find the basic algos distributed like the DecisionTree,RandomForest in dask-ml. If i use the sklearn Tree algos, there will be perhaps the problem of the memory.