dmlc / XGBoost.jl

XGBoost Julia Package
Other
288 stars 110 forks source link

DMatrix from DataFrame exampel in docs fails #123

Closed SebastianCallh closed 1 year ago

SebastianCallh commented 1 year ago

Running the example

using DataFrames, XGBoost
df = DataFrame(randn(10,3), [:a, :b, :c])
dm = DMatrix(df)
XGBoost.getfeaturenames(dm) == ["a", "b", "c"]

from the documentation here gives the error

MethodError: no method matching DMatrix(::DataFrame)
Closest candidates are:
  DMatrix(::Matrix{<:Real}) at ~/.julia/packages/XGBoost/D30Xd/src/xgboost_lib.jl:45
  DMatrix(::Matrix{<:Real}, ::Bool) at ~/.julia/packages/XGBoost/D30Xd/src/xgboost_lib.jl:45
  DMatrix(::Matrix{<:Real}, ::Bool, ::Any; kwargs...) at ~/.julia/packages/XGBoost/D30Xd/src/xgboost_lib.jl:45
  ...

Unfortunate! Has support for DataFrame -> DMatrix been removed? I'm using DataFrames v1.4.1 and XGBoost v1.5.2.

tylerjthomas9 commented 1 year ago

The current version in the registry does not have DMatrix support. You will have to install from GitHub to use it.

julia> using Pkg
julia> Pkg.add("https://github.com/dmlc/XGBoost.jl")
SebastianCallh commented 1 year ago

I see, thanks! I realize now I read the docs in the repo (i.e. on main) since the link in the readme 404s. Do you know if a new release with the aforementioned feature is planned soon?

scheidan commented 1 year ago

On version 2.0.0 the example from the docs still fails:

using DataFrames
using XGBoost

df = DataFrame(randn(100,3), [:a, :b, :y])

# fails
bst = xgboost((df[!, [:a, :b]], df.y)) 

# work around
data = DMatrix(df[!, [:a, :b]], df.y)
bst = xgboost(data) # :)

I believe the reason is that the constructor for Dmatrix overly restrictive:

DMatrix(Xy::DataTuple; kw...) = DMatrix(Xy[1], Xy[2]; kw...)

Relaxing it to

DMatrix(Xy::Tuple; kw...) = DMatrix(Xy[1], Xy[2]; kw...)

solves the issue.

ExpandingMan commented 1 year ago

This has been fixed, thanks to @tylerjthomas9 .