dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
https://xgboost.readthedocs.io/en/stable/
Apache License 2.0
26.25k stars 8.72k forks source link

add extremely randomized tree as base learner? #5056

Open joegaotao opened 4 years ago

joegaotao commented 4 years ago

The variety of mdels plays an important role in model ensemble. I tried some parameters such as "colsample_bytree, colsample_bynode" to make model more stable and different, but trees still grow by some criterion, resulting in the similar models. However, I tried the combination "extratree+lgb", and randomized tree could be used as the certain feature embeding tool that improves model variety. So I suggest adding extremely randomized tree as base learner.

trivialfis commented 4 years ago

@joegaotao I'm not sure how to do that.

Note to myself: http://www.montefiore.ulg.ac.be/~ernst/uploads/news/id63/extremely-randomized-trees.pdf

jamescolless commented 4 years ago

Just as a note, lightgbm added this feature quite recently. I think it would certainly be of value.

https://github.com/microsoft/LightGBM/pull/2671

trivialfis commented 4 years ago

Yup. I recently unified the evaluation procedure, partly for categorical data support, partly for features like this.

jamescolless commented 3 years ago

Just a small bump in case this might be something that could make it into 1.4?

trivialfis commented 3 years ago

Depends. Right now the categorical data support is my priority. I will see how much time left after sorting out categorical data.