dmlc / XGBoost.jl

XGBoost Julia Package
Other
288 stars 110 forks source link

Is there way to retrieve data from DMatrix object? #150

Closed bobaronoff closed 1 year ago

bobaronoff commented 1 year ago

Am trying to look at data matrix stored within a DMatrix object. Is it possible?

tried.....

size(myDMatrix)
myDMatrix.data
XGBoost.getdata(myDMatrix)

The first line returns (400,15) Second line returns 'nothing' - the object Third line returns a SparseMatricesCSR and then throws MethodError: no method matching

Any thoughts greatly appreciated. Thank you.

ExpandingMan commented 1 year ago

I can't reproduce an error on XGBoost.getdata. Could you please include the full stack trace?

As explained here a DMatrix is now an ordinary AbstractMatrix. The complication you are running into is that (unfortunately, due to how it's implemented in libxgboost) accessing data from it requires re-allocating the entire matrix, so I have taken steps to ensure that this doesn't happen until the user explicitly requests it. This can be done however you normally would access data from an AbstractMatrix, for example x[1,1] will retrieve the first element and as a side-effect copy the entire matrix into a Julia object.

I recommend you only do this for verification purposes, you certainly should not use it for anything performance-critical. The best tactic is to keep around whatever object you created the DMatrix from and use that object for whatever you want to do instead of the DMatrix itself.

bobaronoff commented 1 year ago

Your advice is well taken. I created a function that creates the booster and performs analyses on the model. In order to allow the ability to set weights, I need to create the DMatrix and pass that to the function. The reason I am looking for the base data is to allow creation of partial dependence plots. I have a workaround - essentially re-using the feature data as you point out. Will close out issue. For general info, the error I get using XGBoost.getdata() is; for reasons you point out, best to avoid using this function:

101×13 SparseMatricesCSR.SparseMatrixCSR{0, Float32, UInt64} with 1313 stored entries:Error showing value of type SparseMatricesCSR.SparseMatrixCSR{0, Float32, UInt64}:
ERROR: MethodError: no method matching (::SparseMatricesCSR.var"#_format_line#4"{SparseMatricesCSR.SparseMatrixCSR{0, Float32, UInt64}, IOContext{Base.TTY}})(::Int64, ::UInt64, ::Int64, ::Int64)
Closest candidates are:
  (::SparseMatricesCSR.var"#_format_line#4")(::Any, ::Any, ::Any, ::Any, ::Any) at ~/.julia/packages/SparseMatricesCSR/gQcwh/src/SparseMatrixCSR.jl:338
Stacktrace:
  [1] _broadcast_getindex_evalf
    @ ./broadcast.jl:670 [inlined]
  [2] _broadcast_getindex
    @ ./broadcast.jl:643 [inlined]
  [3] getindex
    @ ./broadcast.jl:597 [inlined]
  [4] copy
    @ ./broadcast.jl:899 [inlined]
  [5] materialize
    @ ./broadcast.jl:860 [inlined]
  [6] show(io::IOContext{Base.TTY}, S::SparseMatricesCSR.SparseMatrixCSR{0, Float32, UInt64})
    @ SparseMatricesCSR ~/.julia/packages/SparseMatricesCSR/gQcwh/src/SparseMatrixCSR.jl:374
  [7] show(io::IOContext{Base.TTY}, #unused#::MIME{Symbol("text/plain")}, S::SparseMatricesCSR.SparseMatrixCSR{0, Float32, UInt64})
    @ SparseMatricesCSR ~/.julia/packages/SparseMatricesCSR/gQcwh/src/SparseMatrixCSR.jl:329
  [8] (::REPL.var"#43#44"{REPL.REPLDisplay{REPL.LineEditREPL}, MIME{Symbol("text/plain")}, Base.RefValue{Any}})(io::Any)
    @ REPL /Applications/Julia-1.8.2.app/Contents/Resources/julia/share/julia/stdlib/v1.8/REPL/src/REPL.jl:267
  [9] with_repl_linfo(f::Any, repl::REPL.LineEditREPL)
    @ REPL /Applications/Julia-1.8.2.app/Contents/Resources/julia/share/julia/stdlib/v1.8/REPL/src/REPL.jl:521
 [10] display(d::REPL.REPLDisplay, mime::MIME{Symbol("text/plain")}, x::Any)
    @ REPL /Applications/Julia-1.8.2.app/Contents/Resources/julia/share/julia/stdlib/v1.8/REPL/src/REPL.jl:260
 [11] display(d::REPL.REPLDisplay, x::Any)
    @ REPL /Applications/Julia-1.8.2.app/Contents/Resources/julia/share/julia/stdlib/v1.8/REPL/src/REPL.jl:272
 [12] display(x::Any)
    @ Base.Multimedia ./multimedia.jl:328
 [13] #invokelatest#2
    @ ./essentials.jl:729 [inlined]
 [14] invokelatest
    @ ./essentials.jl:726 [inlined]
 [15] print_response(errio::IO, response::Any, show_value::Bool, have_color::Bool, specialdisplay::Union{Nothing, AbstractDisplay})
    @ REPL /Applications/Julia-1.8.2.app/Contents/Resources/julia/share/julia/stdlib/v1.8/REPL/src/REPL.jl:296
 [16] (::REPL.var"#45#46"{REPL.LineEditREPL, Pair{Any, Bool}, Bool, Bool})(io::Any)
    @ REPL /Applications/Julia-1.8.2.app/Contents/Resources/julia/share/julia/stdlib/v1.8/REPL/src/REPL.jl:278
 [17] with_repl_linfo(f::Any, repl::REPL.LineEditREPL)
    @ REPL /Applications/Julia-1.8.2.app/Contents/Resources/julia/share/julia/stdlib/v1.8/REPL/src/REPL.jl:521
 [18] print_response(repl::REPL.AbstractREPL, response::Any, show_value::Bool, have_color::Bool)
    @ REPL /Applications/Julia-1.8.2.app/Contents/Resources/julia/share/julia/stdlib/v1.8/REPL/src/REPL.jl:276
 [19] (::REPL.var"#do_respond#66"{Bool, Bool, REPL.var"#77#87"{REPL.LineEditREPL, REPL.REPLHistoryProvider}, REPL.LineEditREPL, REPL.LineEdit.Prompt})(s::REPL.LineEdit.MIState, buf::Any, ok::Bool)
    @ REPL /Applications/Julia-1.8.2.app/Contents/Resources/julia/share/julia/stdlib/v1.8/REPL/src/REPL.jl:857
 [20] (::VSCodeServer.var"#98#101"{REPL.var"#do_respond#66"{Bool, Bool, REPL.var"#77#87"{REPL.LineEditREPL, REPL.REPLHistoryProvider}, REPL.LineEditREPL, REPL.LineEdit.Prompt}})(mi::REPL.LineEdit.MIState, buf::IOBuffer, ok::Bool)
    @ VSCodeServer ~/.vscode/extensions/julialang.language-julia-1.38.2/scripts/packages/VSCodeServer/src/repl.jl:122
 [21] #invokelatest#2
    @ ./essentials.jl:729 [inlined]
 [22] invokelatest
    @ ./essentials.jl:726 [inlined]
 [23] run_interface(terminal::REPL.Terminals.TextTerminal, m::REPL.LineEdit.ModalInterface, s::REPL.LineEdit.MIState)
    @ REPL.LineEdit /Applications/Julia-1.8.2.app/Contents/Resources/julia/share/julia/stdlib/v1.8/REPL/src/LineEdit.jl:2510
 [24] run_frontend(repl::REPL.LineEditREPL, backend::REPL.REPLBackendRef)
    @ REPL /Applications/Julia-1.8.2.app/Contents/Resources/julia/share/julia/stdlib/v1.8/REPL/src/REPL.jl:1248
 [25] (::REPL.var"#49#54"{REPL.LineEditREPL, REPL.REPLBackendRef})()
    @ REPL ./task.jl:484
ExpandingMan commented 1 year ago

Oh, this definitely looks like a bug but it looks like a bug in SparseMatrixCSR.jl (it's not an error in getdata at all, it's in show for the resulting matrix), though it's not one I've seen so I don't know what exactly is causing it. You probably should open an issue there, though you might need to create an MWE for it to be useful.