Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
I am thinking about the experiment below and looking for some developer to help.
There is huge concept of dropout in neuralnets, which make NN incredible strong in many tasks. Idea is simple - randomly "cripple" net in favour of better dealing with overfitting. Neural nets than need to start take care more about generalization.
For my dataset this improve slightly but significantly accuracy. Thanx again!
DART drops out whole trees during the learning process.
Another idea is use dropout inside each single tree - actually in each single node. There is already concept of prunning and subspace learning, but I propose following:
General idea is still the same - to build a lot of trees and trying to improve new residuals, but here is different way how to calculate residuals:
idea:
For every tree
1] build a tree in same way as today
2] but, when calculating preds, which are used for calculation residuals ( = new targets for next iteration) make a random decision inside each node (except in terminal leafs): with low probability (ie 10%) switch the learn sided for all data (if rule says go to the left, will go to right)
3] could be great that the probabily of intree dropout is a new parameter, but for the experiment it can be hardcoded for ie 10%
4] same logic needs to be done in predict function as well
Because trees are build sequentially on top of still improving residuals, similarly as dropout in NN, this should help with overfitting.
I am not good in programming, but this can be small change inside nodes evaluation when preds are calculating.
Many thanx for help, actually I believe this can help accuracy as well as DART, but slower the speed.
I am thinking about the experiment below and looking for some developer to help. There is huge concept of dropout in neuralnets, which make NN incredible strong in many tasks. Idea is simple - randomly "cripple" net in favour of better dealing with overfitting. Neural nets than need to start take care more about generalization.
I found first idea of dropout for tree here (paper inside blog): http://dmlc.ml/xgboost/2016/07/02/support-dropout-on-xgboost.html
and it is succesfuly implemented by @marugari
For my dataset this improve slightly but significantly accuracy. Thanx again!
DART drops out whole trees during the learning process.
Another idea is use dropout inside each single tree - actually in each single node. There is already concept of prunning and subspace learning, but I propose following:
General idea is still the same - to build a lot of trees and trying to improve new residuals, but here is different way how to calculate residuals:
idea: For every tree
1] build a tree in same way as today 2] but, when calculating preds, which are used for calculation residuals ( = new targets for next iteration) make a random decision inside each node (except in terminal leafs): with low probability (ie 10%) switch the learn sided for all data (if rule says go to the left, will go to right) 3] could be great that the probabily of intree dropout is a new parameter, but for the experiment it can be hardcoded for ie 10% 4] same logic needs to be done in predict function as well
Because trees are build sequentially on top of still improving residuals, similarly as dropout in NN, this should help with overfitting.
I am not good in programming, but this can be small change inside nodes evaluation when preds are calculating.
Many thanx for help, actually I believe this can help accuracy as well as DART, but slower the speed.
Thanx for any thoughts about this..