UBTC / gopy

0 stars 0 forks source link

understanding the gradient boosting tree in fitted model #3

Open WMJi opened 6 years ago

WMJi commented 6 years ago

Gradient Boosting learns a function that looks something like this:

F(X) = W1*T1(X) + W2*T2(X) + ... + Wi*Ti(X)

where Wi are weights and Ti are weak learners (decision trees). I know how to extract the individual Ti (estimators_ property) from a fitted gradient boosting model in scikit-learn, but is there a way to extract the Wi?

WMJi commented 6 years ago

the Wi's consist of the line-search estimate times the learning rate. In sklearn the learning rate is constant so its pulled out. In gradient-boosting there is actually one weight assigned to each terminal region (aka leaf). Those estimates are stored directly in the trees and updated during the fitting of the gradient boosting model (see [1]).

To access the estimates for terminal regions of the first tree do::

tree = gbrt.estimators_[0, 0].tree_
leaf_mask = tree.children_left == TREE_LEAF  # TREE_LEAF == -1
w_i = tree.value[leaf_mask, 0, 0]

[1] https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/ensemble/gradient_boosting.py#L197