An issue, that data scientist suffer a lot, is data unbalance when training models. Sometimes the positive instance are only 2-5% of whole population. If the data could be rebalanced, the model result may be better, and the score distribution will be better.
Usually, there are three ways to rebalance the data: 1. duplicate records for low population; 2. increase the each weight for low population; 3. down-sample the high population
In Shifu, the rebalance function could be put when we shuffling the normalization dataset.
An issue, that data scientist suffer a lot, is data unbalance when training models. Sometimes the positive instance are only 2-5% of whole population. If the data could be rebalanced, the model result may be better, and the score distribution will be better.
Usually, there are three ways to rebalance the data: 1. duplicate records for low population; 2. increase the each weight for low population; 3. down-sample the high population
In Shifu, the rebalance function could be put when we shuffling the normalization dataset.