JuliaAI / Imbalance.jl

A Julia toolbox with resampling methods to correct for class imbalance.
https://juliaai.github.io/Imbalance.jl/dev/
MIT License
28 stars 1 forks source link

SMOTENC document string #79

Closed ablaom closed 8 months ago

ablaom commented 8 months ago

From the doc-string for MLJ.SMOTENC:

 Transform Inputs
  ≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡

    •  X: A matrix or table of floats where each row is an observation from the dataset

    •  y: An abstract vector of labels (e.g., strings) that correspond to the
       observations in X

while we have

julia> input_scitype(Imbalance.MLJ.SMOTENC)
Tuple{Table{<:AbstractVector{<:Union{Infinite, Finite}}}, AbstractVector}

suggesting:

  1. X can not be a matrix, and
  2. X may have columns with element scitype <:Finite (not just floats)

    contrary to doc-string.

EssamWisam commented 8 months ago

Yes, the float thing is incorrect (and the fact that a matrix is accepted); it's not synced with the docstring in Imbalance.jl. I will fix that. I will also check if there is any other type misinformation in MLJ specific docs for other methods.

Thank you bringing this up.

EssamWisam commented 8 months ago

As a sidenote, RandomWalkOversampler and SMOTENC are the only two methods that don't support matrix input for MLJ (because they require categorical columns to be flagged which the MLJ interface doesn't account for). I think adding that support shouldn't be hard and makes the package more consistent.