Open ExpandingMan opened 7 years ago
See https://github.com/dmlc/xgboost/blob/master/doc/parameter.md and the multi:softprob objective for how a vector output would be handled (as a flattened matrix).
However a deeper question is what you expect to happen in the gradient boosting regression model with a vector output that is different than running a separate model for each dimension. If you can clarify what you want to be different (other than just easier coding), then it will be easier to see if XGBoost supports that.
On Fri, Feb 3, 2017 at 8:16 AM ExpandingMan notifications@github.com wrote:
Hello all. I haven't dug too far into the source code yet, but I'm wondering if it's possible to do regressions where the "label" (target value) consists of multi-dimensional data points. (i.e. the label argument of the xgboost function would be an Array{T<:Number,2}.) This seems like a pretty important feature, but I can't find any literature about it in the xgboost documentation for any language.
It seems to me that even if it's not explicitly supported this should be possible by setting a custom loss function, however I get the following error any time I try to pass a matrix-valued "label":
ERROR: LoadError: MethodError: no method matching (::XGBoost.#_setinfo#8)(::Ptr{Void}, ::String, ::Array{Float64,2}) Closest candidates are: _setinfo{T<:Number}(::Ptr{Void}, ::String, ::Array{T<:Number,1}) at /home/user/.julia/v0.5/XGBoost/src/xgboost_lib.jl:10 in (::XGBoost.##call#7#11)(::Array{Any,1}, ::Type{T}, ::Array{Float64,2}, ::Bool, ::Float32) at /home/user/.julia/v0.5/XGBoost/src/xgboost_lib.jl:59 in (::Core.#kw#Type)(::Array{Any,1}, ::Type{XGBoost.DMatrix}, ::Array{Float64,2}, ::Bool, ::Float32) at ./
:0 in makeDMatrix(::Array{Float64,2}, ::Array{Float64,2}) at /home/user/.julia/v0.5/XGBoost/src/xgboost_lib.jl:137 in #xgboost#20(::Array{Float64,2}, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Type{T}, ::Type{T}, ::Array{Any,1}, ::Array{Any,1}, ::XGBoost.#xgboost, ::Array{Float64,2}, ::Int64) at /home/user/.julia/v0.5/XGBoost/src/xgboost_lib.jl:147 in (::XGBoost.#kw##xgboost)(::Array{Any,1}, ::XGBoost.#xgboost, ::Array{Float64,2}, ::Int64) at ./ :0 in include_from_node1(::String) at ./loading.jl:488 while loading /home/user/RatingsPrediction/xgboost0.jl, in expression starting on line 43 Taking a look at the source code I get the impression it is not designed to pass labels that aren't Vectors into the C code. Certainly the above error seems to indicate that it is impossible to set a "label" that cannot be converted to Vector.
Is there any way around this? Does the Python API support this? Thanks.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dmlc/XGBoost.jl/issues/38, or mute the thread https://github.com/notifications/unsubscribe-auth/ADkTxR-9mV7wWqzIJmMPFQo25l-uaQhBks5rY1LogaJpZM4L2gs5 .
Thanks for your prompt response.
I don't see any significant problem with using multiple models (as far as I can think, in the case of gradient boosted trees this should be exactly equivalent to "one" multi-dimensional model). Of course, one usually doesn't have to resort to this (from an API standpoint), hence the issue. Apart from convenience, I'd be a bit concerned about performance issues if I were fitting in a high-dimensional space, but perhaps that's unwarranted.
Deep learning API's often allow vector output because they share parameters during such multitask learning. My guess is since GBM's don't typically do this, running separate models is the most explicit way of doing this without implying that any parameter sharing is happening. I think Tianqi wrote a paper with Carlos a while back on accounting for certain types of dependence among the output features, so you might also check that out if you want.
On Fri, Feb 3, 2017 at 9:49 AM ExpandingMan notifications@github.com wrote:
Thanks for your prompt response.
I don't see any significant problem with using multiple models (as far as I can think, in the case of gradient boosted trees this should be exactly equivalent to "one" multi-dimensional model). Of course, one usually doesn't have to resort to this (from an API standpoint), hence the issue. Apart from convenience, I'd be a bit concerned about performance issues if I were fitting in a high-dimensional space, but perhaps that's unwarranted.
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/dmlc/XGBoost.jl/issues/38#issuecomment-277314496, or mute the thread https://github.com/notifications/unsubscribe-auth/ADkTxcgs7lxRJCWNUCMsUdKnn-edWr3zks5rY2iNgaJpZM4L2gs5 .
I have a related confuse, according to some out-of-date documentation, eg: [https://xgboost.readthedocs.io/en/release_0.72/python/python_api.html]() label ([list] or numpy 1-D array, optional) – Label of the training data. it seems only 1-D array is accepted as label for construction of matrix. but from the newly created version, [https://xgboost.readthedocs.io/en/stable/python/python_api.html#module-xgboost.training]() label (array_like) – Label of the training data. the form of label is of no limit, and we could pass a 2-D array as label that's true, but a strange thing come out, that when we use dmatrix.get_label() to look at this 2-D array, it seems the underground process has done a flatten and just keep the first "sample length" elements, like this:
X = pd.DataFrame(data=[[1,0], [2,2], [0,3], [4,4]])
y = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
dsoft_fake = xgb.DMatrix(X.values, label=y)
dsoft_fake.get_label()
output:
array([1., 2., 3., 4.], dtype=float32)
so my question is,
thanks for explanation in advance
The matrix input for labels is a recent addition (1.6) for multi-output and multi-label, the getter hasn't been able to return the matrix yet.
Hello all. I haven't dug too far into the source code yet, but I'm wondering if it's possible to do regressions where the "label" (target value) consists of multi-dimensional data points. (i.e. the
label
argument of thexgboost
function would be anArray{T<:Number,2}
.) This seems like a pretty important feature, but I can't find any literature about it in the xgboost documentation for any language.It seems to me that even if it's not explicitly supported this should be possible by setting a custom loss function, however I get the following error any time I try to pass a matrix-valued "label":
Taking a look at the source code I get the impression it is not designed to pass labels that aren't
Vector
s into the C code. Certainly the above error seems to indicate that it is impossible to set a "label" that cannot be converted toVector
.Is there any way around this? Does the Python API support this? Thanks.