Benchmark with independent classification model

jalvear2dxc commented 3 years ago

Hello @arogozhnikov ,

In order to check the quality of the reweighting process, I have used an independent classifier, based on the Ugradient boosting class in the same dataset, following the steps of the list below:

Before Reweighting: Training (using prior weights as sample weights) and scoring
Reweighting
After Reweighting: Training using new weights as sample weights and scoring

When comparing the results with those of the reweighter classifier (rw.gb), I find that the decrease in the Weighted AUC is much greater than the obtained with the independent classifier,

Results before reweigthing: classifier AUC = 0.99 rw.gb AUC = 0.99 Results after reweigthing: classifier AUC = 0.95 rw.gb AUC = 0.55 Could you help me to identify a possible cause of this difference in behavior?

arogozhnikov commented 3 years ago

Hi @jalvear2dxc, I'm not completely following which classifiers you compare, but large difference you report is possible.

Naturally, reweighing would remove discrepancies that are picked by models with tree configuration (e.g. depth) that is similar to reweighter's trees. If you use uniforming loss, this may become an additional hint to classifier (though hard to predict without understanding/pondering the data).

Also, check that you use correct weights in every training and in every AUC scoring. Just in case.

jalvear2dxc commented 3 years ago

Thanks Mr. @arogozhnikov.

I've improved dramatically the results not training a new classifier after the reweighting but just correcting the predictions of the firs model with the predicted weigths. Does it make sense? I think this is according with what you said in the answer.

arogozhnikov commented 3 years ago

@jalvear2dxc yes, seems to match with what I suggested

arogozhnikov / hep_ml

Benchmark with independent classification model #68