Open chengsoonong opened 7 years ago
One option would be to have a knob between 0 and 1. 0 = ignore class imbalance, and train. All predictors should support this. 1 = use the class proportions as the weights (balanced in sklearn). But some predictors may support this inbetween = interpolate between 0 and 1.
Sounds good to me.
For my own curiosity: In what situation would a value of, say, 0.5 be useful?
If the astronomer wants to take care of class imbalance (say interested in rare classes), but does not trust that the class proportion observed in the current labelled set is the true class proportions.
This kind of reasoning is typical in machine learning. We assume that we know how to adjust if we know the true population value. But we really don't know, so we estimate the value based on data. But we don't trust the estimate, so we hedge.
Do you mean how to handle it? The simplest way is to tack a keyword argument onto the Predictor à la sklearn. Hypothetically a class balancer could be part of a pipeline but in my opinion this is a problem for the Predictor to deal with.