arogozhnikov / hep_ml

Machine Learning for High Energy Physics.
176 stars 64 forks source link

Random behavior of GBReweighter and UGradientBoostingClassifier #55

Open arogozhnikov opened 5 years ago

arogozhnikov commented 5 years ago

(Leaving this as an open answer to common question)

Why GBReweighter/UGradientBoostingClassifier provide different weights after each training?

Both algorithms are based on stochastic tree boosting. Settings like subsample and max_features drive to randomized tree building (i.e. each tree uses only random part of train data), which is widely known to strengthen ensemble by building more diverse trees.

hep_ml follows sklearn convention to keep random things random unless explicitly asked otherwise.

Reproducible behavior is achieved with setting random_state

for boosting:
UGradientBoostingClassifier(<other setting here>, random_state=42)
for reweighter
GBReweighter(<other setting here>, gb_args={'random_state': 42, <other gb args>})