arogozhnikov / hep_ml

Machine Learning for High Energy Physics.
https://arogozhnikov.github.io/hep_ml/
Other
176 stars 64 forks source link

GBReweights seems to be not working in my case #84

Closed Soumyatifr closed 8 months ago

Soumyatifr commented 9 months ago

Dear experts, I am using GBreweighting to reweight control region (original) data to a signal data (target data), roughly the statistics of these two regions are : for control region ~ 1.5M and for signal region 25K, and I am using 5-7 variables for the reweighing, the individual KS score is high for only one variable around 0.15 and for the rest of the varaivles KS score < 0.05. The model I am using :

reweighter = reweight.GBReweighter(n_estimators=30, learning_rate=0.1, max_depth=3, min_samples_leaf=100,
                                   gb_args={'subsample': 0.4}) 

There is no change of the variables before and after the reweighting, the two set of plots are attached here. Could you please check where I have made the mistake or where I need mofification to make the GBReweighting method useful.
Thanks in advance, Soumya Before reweighting before_reweighting After reweighting after_reweighting

arogozhnikov commented 9 months ago

Hi @Soumyatifr , interesting case. Can't guess the reason, but for sanity check - how about leaving only one variable ('HT') in training and see if it gets corrected?

Soumyatifr commented 9 months ago

Dear Alex @arogozhnikov, Thanks for your very prompt reply, I have checked with only "HT" variable but still it doesn't want, it would be very hgelpful for me, if you can see the code [1] and the two root files , which I have uploaded to my google drive[2]. [1] https://github.com/Soumyatifr/GBReweighting/blob/main/bdt_reweighet.ipynb [2] https://drive.google.com/drive/folders/1ZCkF0V58O_fC1gnKqFRL-HYaE97mYwbe?usp=sharing

arogozhnikov commented 9 months ago

@Soumyatifr just ran your code with single HT variable, and it clearly has strong improvements:

before reweighting

after reweighting

hopefully it looks similar in your env, otherwise there is some problem with package versions.

arogozhnikov commented 8 months ago

ok, assume that helped