bcbi / ClassImbalance.jl

Sampling-based methods for correcting for class imbalance in two-category classification problems
Other
11 stars 9 forks source link

Unpredictable Under/Over Pct Results #56

Open aeisman opened 4 years ago

aeisman commented 4 years ago

Describe the bug It appears the under/over pct is still producing unexpected results. It works for the ratios of the example problem but when the ratios are changed, results are difficult to interpret.

To Reproduce Steps to reproduce the behavior: function calculate_smote_pct_under(; pct_over::Real = 0, minority_to_majority_ratio::Real = 0,) if pct_over < 0 error("pct_over must be >=0") end if minority_to_majority_ratio <= 0 error("minority_to_majority_ratio must be >0") end result = 100minority_to_majority_ratio(100+pct_over)/pct_over return result end

Testing SMOTE

example 1 (works - ends up with 360 total in y2):

over_ratio = (180-20)/20 * 100 y = vcat(ones(20), zeros(180)); # 0 = majority, 1 = minority X = hcat(rand(200, 10), y); under_ratio = calculate_smote_pct_under(pct_over = over_ratio, minority_to_majority_ratio = 1.0) x2, y2 = smote(X, y, k = 5, pct_under = under_ratio, pct_over = over_ratio)

example 2 (breaks - do not end up with 1226 in y2):

over_ratio = (613-268)/268 * 100 y = vcat(ones(268), zeros(613)); # 0 = majority, 1 = minority X = hcat(rand(881, 10), y); under_ratio = calculate_smote_pct_under(pct_over = over_ratio, minority_to_majority_ratio = 1.0) x2, y2 = smote(X, y, k = 5, pct_under = under_ratio, pct_over = over_ratio)

Expected behavior See above notes in #example 1 and #example 2