Closed edwardwliu closed 1 year ago
@edwardwliu Would it be useful if Treelite implements a function predict_bytree
that returns individual predictions per tree? This function would have a similar implementation as predict_leaf
.
Yes, returning individual predictions by tree would work great.
Background
Treelite currently returns a single averaged value across all trees in a forest. However, in some cases such as for out-of-bag predictions, only a subset of the trees may be used for averaging. Out-of-bag predictions can be extremely useful for evaluating models and further model diagnostics. Although Treelite itself does not track which observations were out-of-sample per tree, many libraries do provide this information (e.g. _generate_unsampled_indices() in Sklearn). If Treelite were to expose the individual predictions per tree, users could then calculate out-of-bag results by hand.
Potential Implementation
Treelite could return an array of predictions per tree. See this notebook for a detailed exploration.
Alternatively, a user could pre-emptively specify which trees should be used for averaging. If implemented, the existing feature request for
predict_leaf()
could also be manipulated to return OOB predictions, but would require a much slower multi-pass approach.