chengsoonong / acton

Active Learning: Predictors, Recommenders and Labellers
BSD 3-Clause "New" or "Revised" License
20 stars 5 forks source link

Class imbalance #21

Open chengsoonong opened 7 years ago

MatthewJA commented 7 years ago

Do you mean how to handle it? The simplest way is to tack a keyword argument onto the Predictor à la sklearn. Hypothetically a class balancer could be part of a pipeline but in my opinion this is a problem for the Predictor to deal with.

chengsoonong commented 7 years ago

One option would be to have a knob between 0 and 1. 0 = ignore class imbalance, and train. All predictors should support this. 1 = use the class proportions as the weights (balanced in sklearn). But some predictors may support this inbetween = interpolate between 0 and 1.

MatthewJA commented 7 years ago

Sounds good to me.

For my own curiosity: In what situation would a value of, say, 0.5 be useful?

chengsoonong commented 7 years ago

If the astronomer wants to take care of class imbalance (say interested in rare classes), but does not trust that the class proportion observed in the current labelled set is the true class proportions.

This kind of reasoning is typical in machine learning. We assume that we know how to adjust if we know the true population value. But we really don't know, so we estimate the value based on data. But we don't trust the estimate, so we hedge.