XGBoostClassifier vectorized output

Hi, you can get the index of each leaf node of all trees used for a prediction. Just use the predict method of the booster object (http://xgboost.readthedocs.io/en/latest/python/python_api.html, class xgboost.Booster) with pred_leaf=True.

from sklearn.datasets import load_breast_cancer
from xgboost import XGBClassifier, DMatrix

X = load_breast_cancer().data
y = load_breast_cancer().target

clf = XGBClassifier(n_estimators=3, max_depth=2)
clf.fit(X, y)

booster = clf.booster()
print booster.predict(DMatrix(X[:5]), pred_leaf=True)
>> array([[5, 6, 6],
>>           [6, 6, 6],
>>           [6, 6, 6],
>>           [4, 4, 6],
>>           [5, 6, 6]], dtype=int32)

If you are interested in the leaf node value as well you can use get_dump() and parse the output:

import re
reg_ex = r"(\d+):leaf=([-+]?\d*\.\d+|\d+)"
print ['Tree {}: {}'.format(i, re.findall(reg_ex, tree_str)) for i, tree_str in enumerate(booster.get_dump())]
>> ["Tree 0: [('3', '0.191691'), ('4', '-0.04'), ('5', '0.00952381'), ('6', '-0.19096')]",
>> "Tree 1: [('3', '0.165413'), ('4', '-0.130044'), ('5', '-0.00545294'), ('6', '-0.176956')]",
>> "Tree 2: [('3', '0.154973'), ('4', '-0.106707'), ('5', '0.0589427'), ('6', '-0.158889')]"]

dmlc / xgboost

XGBoostClassifier vectorized output #2038