Closed CarloLucibello closed 2 years ago
here b/c I am confused about how oversample
works.
# 6 observations with 3 features each
X = rand(3, 6)
# 2 classes, severely imbalanced
Y = ["a", "b", "b", "b", "b", "a"]
# oversample the class "a" to match "b"
X_bal, Y_bal = oversample(X, Y)
# this results in a bigger dataset with repeated data
@assert size(X_bal) == (3,8)
@assert length(Y_bal) == 8
# now both "a", and "b" have 4 observations each
@assert sum(Y_bal .== "a") == 4
@assert sum(Y_bal .== "b") == 4
does not hold as advertised...
Fix #113 by having the implementation adhere to the docs instead of changing the docs. The resampled classes are now always returned.
Also, made the under/oversample calls deterministic when
shuffle=false
.Since the change is breaking with respect to previous behavior (but non-breaking with respect to the behavior declaimed in the docs) I'm also updating the minor version.