arogozhnikov / hep_ml

Machine Learning for High Energy Physics.
https://arogozhnikov.github.io/hep_ml/
Other
176 stars 64 forks source link

weight normalisation #36

Open bifani opened 8 years ago

bifani commented 8 years ago

I have used hep_ml in the past weeks to reweight MC distributions and stumbled upon the following issue When determining weights as data/MC ratio of normalised distributions, the computed weights are normalised such as Sum w_i = N However, I noticed this is not the case for weights obtained using hep_ml.reweight Is this expected or am I missing something?

arogozhnikov commented 8 years ago

Hi Simone, it is important though not noted in the documentation: normalization constant in reweighters is not fixed.

This is because the final normalization constant may depend on third-party factors.

In many cases the normalization constant does not play a significant role (e.g. to compute efficiencies / ROC curves / train classifiers), however when it does, you should compute it yourself.


Explanation: absence of normalization in reweighters makes it possible to guarantee that reweighter.predict_weights is deterministic mapping.

E.g. if you predict a large sample at once or predict separately weight for each event and concatenate predictions - the result is the same. If you normalize, obviously the result is wrong in the second case.

jcob95 commented 4 years ago

Hi, related to this question, I'm trying to compare a single reweighter trained and tested using the entire dataset to several reweighters which are trained on individual bins of the data. What I'm trying to do is reconstruct the reweighted distributions over the whole data range from the binned reweighters.

Therefore, is it possible to obtain the normalization constant used somehow or can I normalize the reweighters externally?

Thanks

arogozhnikov commented 4 years ago

@jcob95, you should renormalize externally. As I understand your case, you should compute expected amount of samples in each bin first, and then within each bin you need to apply normalization so that total weight coincides with expected.