arogozhnikov / hep_ml

Machine Learning for High Energy Physics.
https://arogozhnikov.github.io/hep_ml/
Other
176 stars 64 forks source link

uBoost de-correlation power #50

Closed gmarceca closed 6 years ago

gmarceca commented 6 years ago

Hello,

I'm trying to run uBoost to get a flat bkg efficiency with mass. In particular, I want the efficiency to be flat at 8% of bkg efficiency. To do this I used uBoostBDT and set 'target_efficiency'=0.08 and 'uniform_label': 0. I ran GridSearchCV to get the best hyper-parameters and trained on those for ~100 boostings and with different 'uniform_rate' values, e.g [0,5,10,15,20]. Looking at the bkg efficiency vs mass plots I see that at "bkg.eff. = 92%" profile gets gradually more flat, as 'uniform_rate' increases, which is exactly the behaviour I want, but for the wrong profile!. This made me suspect that to get a flat 8% bkg.eff I need to set 'target_efficiency'=0.92.

Looking at the code, I see there is a flip of sign in the clf. score [https://github.com/arogozhnikov/hep_ml/blob/master/hep_ml/uboost.py#L182 self.signed_uniform_label = 2 * self.uniform_label - 1

[https://github.com/arogozhnikov/hep_ml/blob/master/hep_ml/uboost.py#L243] signed_score = score * self.signed_uniform_label

So my interpretation of 'target_efficiency' is different for the two classes: when flattening signal, it is the amount of signal to keep; when flattening bkg, it is the amount of bkg to discard.

Is this reasoning correct?

Thanks in advance, Gino

arogozhnikov commented 6 years ago

Hi Gino, your reasoning is correct.

Unfortunately, efficiency parameter works in a different way for background (eff' = 1 - true_eff), this is confusing, but won't be changed as some code used this already.

gmarceca commented 6 years ago

Thanks very much for the clarification, I'll close the issue then.

Best, Gino