JuliaData / DataFrames.jl

In-memory tabular data in Julia
https://dataframes.juliadata.org/stable/
Other
1.73k stars 367 forks source link

Updating ClassImbalance.jl; Needed help debugging #2886

Closed BuddhiLW closed 3 years ago

BuddhiLW commented 3 years ago

Hello, I'm working on ClassImbalance.jl,

When I was updating DataFrames v0.20 to v1.2.2, I had the following problem:

[src/smote_exs.jl] The function matrix_to_dataframe is defined as,

function matrix_to_dataframe(X_new::Array{Float64, 2}, dat::DataFrames.DataFrame, factor_indcs::Array{Int, 1})
    X_synth = DataFrames.DataFrame()
    p = size(X_new, 2)
    for j = 1:p
        if j ∈ factor_indcs
            X_synth[:, j] = float_to_factor(X_new[:, j],
                                            DataFrames.levels(dat[:, j]))
        else
            X_synth[:, j] = X_new[:, j]
        end
    end
    X_synth
end

[src/utils.jl] In turn, it calls float_to_factor

function float_to_factor(v::T, levels::S) where T <: AbstractArray where S <: AbstractVector
    sort!(levels)
    str_vect = map(x -> levels[convert(Int, x)], v)
    result = Array(str_vect)
    return result
end

The problem in the repl, returns:

     X2,y2 = smote(df[!, [:age, :iq, :eyes,
              :degree]],
           df__.blue,
           k=5,
           pct_under=150,
           pct_over=200)

with

ERROR: ArgumentError: Cannot assign to non-existent column: 1
Stacktrace:
 [1] insert_single_column!(df::DataFrame, v::Vector{Float64}, col_ind::Int64)
   @ DataFrames ~/.julia/packages/DataFrames/vuMM8/src/dataframe/dataframe.jl:611
 [2] setindex!
   @ ~/.julia/packages/DataFrames/vuMM8/src/dataframe/dataframe.jl:628 [inlined]
 [3] setindex!(df::DataFrame, v::Vector{Float64}, row_inds::Colon, col_ind::Int64)
   @ DataFrames ~/.julia/packages/DataFrames/vuMM8/src/dataframe/dataframe.jl:679
 [4] **matrix_to_dataframe**(X_new::Matrix{Float64}, dat::DataFrame, factor_indcs::Vector{Int64})
   @ ClassImbalance ~/PP/Julia/Package-updates/ClassImbalance.jl/src/smote_exs.jl:28

I was testing matrix_to_dataframe with:

matrix_to_dataframe(ones(3,3), DataFrames.DataFrame(), [1;2;3])

Inspecting step by step of the matrix_to_dataframe, I stumbled with

julia> float_to_factor(X_new[:,1],DataFrames.levels(dat[:,1]))
ERROR: BoundsError: attempt to access data frame with 0 columns at index [1]
Stacktrace:
 [1] getindex
   @ ~/.julia/packages/DataFrames/vuMM8/src/other/index.jl:183 [inlined]
 [2] getindex(df::DataFrame, row_inds::Colon, col_ind::Int64)
   @ DataFrames ~/.julia/packages/DataFrames/vuMM8/src/dataframe/dataframe.jl:499
 [3] top-level scope
   @ REPL[102]:1

julia> dat
0×0 DataFrame

Can you lend me a hand on how to solve this one? I guess the problem is with how to make a logic-firewall, in case data == 0x0 DataFrame, then <do something that actually can be accessed in position 1>.

bkamins commented 3 years ago

The problem is with:

X_synth[:, j] = ...

as j is an integer.

Note the following:

julia> using DataFrames

julia> df = DataFrame()
0×0 DataFrame

julia> df[:, 1] = [1, 2, 3]
ERROR: ArgumentError: Cannot assign to non-existent column: 1

julia> df[:, "x1"] = [1, 2, 3]
3-element Vector{Int64}:
 1
 2
 3

julia> df
3×1 DataFrame
 Row │ x1
     │ Int64
─────┼───────
   1 │     1
   2 │     2
   3 │     3

You cannot refer using an integer index to a non existing column. However, you can CREATE it if you give it a name. What name you want to use is up to you, but the simplest would probably be e.g.:

X_synth[:, Symbol(:x, j)] = ...

In general it is better to as such questions either on StackOverflow with julia and dataframe tags, or on Slack in #data channel. In this way more people are likely to see your question so you are likely to get the response faster in general.

We usually keep GitHub issues for reporting feature requests or bugs to the package. Thank you!

BuddhiLW commented 3 years ago

I whom thank you @bkamins

I was able to make it work. There was a place that I had to change names! to renames!, and all worked out smoothly from there.

I will now make a pull request to ClassImabalance.jl @DilumAluthge has been pretty fast and attentive when I went there.

Julia has a great community behind all its functionalities. Great to be a tinny part in it and see it happening.

BuddhiLW commented 3 years ago

pic-selected-210918-2109-22

bkamins commented 3 years ago

Great. If you have any further questions please do not hesitate to ask.