dmlc / XGBoost.jl

XGBoost Julia Package
Other
288 stars 111 forks source link

Changes to predict() that allow specification of prediction 'type' #171

Closed bobaronoff closed 1 year ago

bobaronoff commented 1 year ago

Theses changes would allow a user access to a feature of libxgboost that reports feature contributions and/or interactions on the record level. This can be useful for Shapley type analyses. The additional data is obtained via specification of the 'type' parameter in Lib.XGBoosterPredictFromDMatrix().

A 'type' parameter is added to predict(); Input values are 0 through 6. The meaning associated with each parameter value is added to the docstring. This is an optional parameter with a default of 0 which provides normal output.

Was not certain what to do with old parameter 'margin' - I removed it as is redundant to the 'type' specification although might cause issue if others have it in and code.

There is significant variation in the dimensions of data returned dependent on the 'type' and booster objective ( multi class objectives return and extra dimension). 'type' 2 and 3 return 2 dimension array. 'type' 4 and 5 return 3 dimensional array. transpose() fails on 3 dimensional array and is replaced with permutedims(). This creates a trade-off in that permutedims() reallocates memory for array although the Matrix Type is more robust than the Transpose Type. For normal prediction(i.e. 'type'=0 where return is vector), there is no additional allocation so this should not impact operations that call predict many times ( for the creation of learning curves/cross validation).

Bob

bobaronoff commented 1 year ago

I apologize for the messy PR. On my end the 6 changes were cumulative and not separate. This is my first ever PR. Tried resending but same result. If too difficult to follow changes, happy to resubmit if there is some 'trick' to rolling the incremental changes into one final for comparison.

Bob