(Leaving this as an open answer to common question)
Why GBReweighter/UGradientBoostingClassifier provide different weights after each training?
Both algorithms are based on stochastic tree boosting. Settings like subsample and max_features drive to randomized tree building (i.e. each tree uses only random part of train data), which is widely known to strengthen ensemble by building more diverse trees.
hep_ml follows sklearn convention to keep random things random unless explicitly asked otherwise.
Reproducible behavior is achieved with setting random_state
for boosting:
UGradientBoostingClassifier(<other setting here>, random_state=42)
for reweighter
GBReweighter(<other setting here>, gb_args={'random_state': 42, <other gb args>})
(Leaving this as an open answer to common question)
Why GBReweighter/UGradientBoostingClassifier provide different weights after each training?
Both algorithms are based on stochastic tree boosting. Settings like
subsample
andmax_features
drive to randomized tree building (i.e. each tree uses only random part of train data), which is widely known to strengthen ensemble by building more diverse trees.hep_ml
followssklearn
convention to keep random things random unless explicitly asked otherwise.Reproducible behavior is achieved with setting
random_state