dmlc / XGBoost.jl

XGBoost Julia Package
Other
288 stars 111 forks source link

confusing error messages when passed tables with invalid types #154

Open bobaronoff opened 1 year ago

bobaronoff commented 1 year ago

received unusual error message trying to convert a DataFrame to DMatrix. Have in the past and currently convert other DataFrame object without issue. Not sure what is different with this object. Any clues via the error message or suggestion how to troubleshoot would be helpful.

Here is the error

julia> typeof(s)
DataFrame

julia> DMatrix(s)
ERROR: ArgumentError: DMatrix requires either an AbstractMatrix or table satisfying the Tables.jl interface
Stacktrace:
 [1] DMatrix(tbl::Matrix{Any}; feature_names::Vector{String}, kw::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ XGBoost ~/.julia/packages/XGBoost/Fyff4/src/dmatrix.jl:249
 [2] DMatrix(tbl::DataFrame; feature_names::Vector{String}, kw::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ XGBoost ~/.julia/packages/XGBoost/Fyff4/src/dmatrix.jl:251
 [3] DMatrix(tbl::DataFrame)
   @ XGBoost ~/.julia/packages/XGBoost/Fyff4/src/dmatrix.jl:244
 [4] top-level scope
   @ REPL[63]:1
bobaronoff commented 1 year ago

additional introspection on DataFrame object:

julia> Tables.istable(s)
true

julia> Tables.columnnames(s)
35-element Vector{Symbol}:
ExpandingMan commented 1 year ago

This is expected behavior but a bad error message. The conversion to a matrix is resulting in something with eltype Any where it's expecting Real.

We probably should try to standardize the cases in which the Any elements get converted. It's certainly reasonable for it to fail in some cases, but it wouldn't surprise me if currently it fails in some cases that are not so reasonable.

Could you list all types present in your input tables? I think in your case it would be

Set(Iterators.map(typeof, Tables.matrix(s)))
ExpandingMan commented 1 year ago

On second thought, something else fishy is happening here. This specific error should only happen when !Tables.istable(s).

Also could you please try this on the latest version of XGBoost.jl? From your stack trace it looks like this is at least a few commits old.

bobaronoff commented 1 year ago

Thank you for prompt response! I believe you hit the problem. A few of the columns in DataFrame object contain String type. I need to figure out the source of this issue on my end (object is end result of multiple conversions of string to numbers and these columns seems to have been missed).

Will recontact if this does not correct issue.

ExpandingMan commented 1 year ago

A few of the columns in DataFrame object contain String type

In that case this was definitely supposed to throw an error but this error message was pretty terrible and confused even me. So this is an actionable issue in that we need an improved error message here. I'd be happy if it hit a MethodError, but this is just downright confusing.

bobaronoff commented 1 year ago

The string element was issue here. Leave it to your discretion if any changes in error message are needed.

Thanks for helping me sort out the issue.

ExpandingMan commented 1 year ago

Yes, in my opinion this message is sufficiently confusing that it warrants an open issue, so let's keep this open, though it's not really a high priority to fix it.