arogozhnikov / hep_ml

Machine Learning for High Energy Physics.
https://arogozhnikov.github.io/hep_ml/
Other
176 stars 64 forks source link

Nominal weights when correcting already weighted original #56

Open RDMoise opened 5 years ago

RDMoise commented 5 years ago

Hi, I'm trying to correct the distribution D in an original (MC) sample that already has some weights, say w_i, that correct something else (say Dp). The way I'm currently doing this is I obtain weights, say x_i, by calling predict_weights(original = D_array, original_weight = w). My question is the following: once I've done this, do I have to use x_i or w_i * x_i as nominal weights for my MC (i.e. to have both D and Dp corrected)? If the answer is x_i, then very naively one could assume that the ratio of the two sets of corrections (x_i, w_i) would yield something that corrects Dp but not D. Is this assumption correct?

Cheers, Dan

arogozhnikov commented 5 years ago

Hello Dan,

here is how weight prediction is implemented

In [2]: GBReweighter.predict_weights??
Signature: GBReweighter.predict_weights(self, original, original_weight=None)
Source:   
    def predict_weights(self, original, original_weight=None):
        """
        Returns corrected weights. Result is computed as original_weight * reweighter_multipliers.

        :param original: values from original distribution of shape [n_samples, n_features]
        :param original_weight: weights of samples before reweighting.
        :return: numpy.array of shape [n_samples] with new weights.
        """
        original, original_weight = self._normalize_input(original, original_weight)
        multipliers = numpy.exp(self.gb.decision_function(original))
        return multipliers * original_weight

So multiplication is done for you (as the last line says), just use the output of this method. Note that during training of reweighter you should also provide weights that you previously used to correct Dp, then it should work as expected.

Also note that second step of correction may break corrections of the first step if you don't require reweighter to correct Dp too. In many practical situations you may not care about that if D and Dp are quite independent.