bcbi / ClassImbalance.jl

Sampling-based methods for correcting for class imbalance in two-category classification problems
Other
11 stars 9 forks source link

Update requirements, without Manifest.toml #87

Closed BuddhiLW closed 2 years ago

DilumAluthge commented 2 years ago

bors try

BuddhiLW commented 2 years ago

I was unable to run the tests locally, because I was not able to utilize R binary packages. But, using the dev version I needed, so I had DataFrames.jl v1.2.2, I had the following issue,

julia> smote(df[!,[:Age, :CreditScore],], df.Exited)
ERROR: ArgumentError: Cannot assign to non-existent column: 1
Stacktrace:
 [1] insert_single_column!(df::DataFrame, v::Vector{Float64}, col_ind::Int64)
   @ DataFrames ~/.julia/packages/DataFrames/vuMM8/src/dataframe/dataframe.jl:611
 [2] setindex!
   @ ~/.julia/packages/DataFrames/vuMM8/src/dataframe/dataframe.jl:628 [inlined]
 [3] setindex!(df::DataFrame, v::Vector{Float64}, row_inds::Colon, col_ind::Int64)
   @ DataFrames ~/.julia/packages/DataFrames/vuMM8/src/dataframe/dataframe.jl:679
 [4] matrix_to_dataframe(X_new::Matrix{Float64}, dat::DataFrame, factor_indcs::Vector{Int64})
   @ ClassImbalance ~/PP/MonitoriaEstatistica/Regression/dev/ClassImbalance/src/smote_exs.jl:28
 [5] smote_obs(dat::DataFrame, pct::Int64, k::Int64, column_names::Vector{String})
   @ ClassImbalance ~/PP/MonitoriaEstatistica/Regression/dev/ClassImbalance/src/smote_exs.jl:123
 [6] _smote(X::DataFrame, y::Vector{Int64}, k::Int64, pct_over::Int64, pct_under::Int64)
   @ ClassImbalance ~/PP/MonitoriaEstatistica/Regression/dev/ClassImbalance/src/ub_smote.jl:38
 [7] #smote#10
   @ ~/PP/MonitoriaEstatistica/Regression/dev/ClassImbalance/src/ub_smote.jl:89 [inlined]
 [8] smote(X::DataFrame, y::Vector{Int64})
   @ ClassImbalance ~/PP/MonitoriaEstatistica/Regression/dev/ClassImbalance/src/ub_smote.jl:89
 [9] top-level scope
   @ REPL[78]:1

I'm following the guide to logistic regression: https://www.machinelearningplus.com/julia/logistic-regression-in-julia-practical-guide-with-examples/

I couldn't reproduce the smote command; the rest is working.

DilumAluthge commented 2 years ago

Yeah, I suspect that ClassImbalance won't work in newer versions of DataFrames.

To be honest, the whole package needs a rewrite from the ground up. I just haven't had the time yet.

BuddhiLW commented 2 years ago

Do you know of any alternative, in the meantime?

I see that you are a part of innumerable projects. I can't even imagine how your life must be. I participate in three projects and I nearly don't have a social life.

Take your time... Anyhow, very few people know of and are willing to treat Class Imbalance lol (although, I'm interested).

BuddhiLW commented 2 years ago

I found the error:

In this function,

function matrix_to_dataframe(X_new::Array{Float64, 2}, dat::DataFrames.DataFrame, factor_indcs::Array{Int, 1})
    X_synth = DataFrames.DataFrame()
    p = size(X_new, 2)
    for j = 1:p
        if j ∈ factor_indcs
            X_synth[:, j] = float_to_factor(X_new[:, j],
                                            DataFrames.levels(dat[:, j]))
        else
            X_synth[:, j] = X_new[:, j]
        end
    end
    X_synth
end

We have (dat has been defined as dat = DataFrames.DataFrame()):

julia> float_to_factor(X_new[:,1],DataFrames.levels(dat[:,1]))
ERROR: BoundsError: attempt to access data frame with 0 columns at index [1]
Stacktrace:
 [1] getindex
   @ ~/.julia/packages/DataFrames/vuMM8/src/other/index.jl:183 [inlined]
 [2] getindex(df::DataFrame, row_inds::Colon, col_ind::Int64)
   @ DataFrames ~/.julia/packages/DataFrames/vuMM8/src/dataframe/dataframe.jl:499

Further insight, but irrelevant for now: It happens when,

matrix_to_dataframe(ones(3,3), DataFrames.DataFrame(), [1;2;3])
ERROR: TypeError: non-boolean (Int64) used in boolean context
Stacktrace:
 [1] matrix_to_dataframe(X_new::Matrix{Float64}, dat::DataFrame, factor_indcs::Vector{Int64})
   @ Main ./REPL[58]:5
 [2] top-level scope
   @ REPL[93]:1

When in turn, it's used in smote_obs, then _smote.

BuddhiLW commented 2 years ago

I could use smote locally, with further changes. Can you try to run the tests?

Thanks for the attention, @DilumAluthge