Closed LudvicLaberge closed 1 year ago
Hi, if you are using the Python interface Booster
object, you can use the slice operator: https://xgboost.readthedocs.io/en/stable/python/model.html along with base_margin
in DMatrix
to achieve what you want. I can work on an example later if necessary.
Thanks for the quick reply! I'll give it a shot, thanks!
these tests might be helpful https://github.com/dmlc/xgboost/blob/e49e0998c0fd9f144f113d3eba2c43b7b951335a/tests/python/test_basic_models.py#L519 .
So I gave it a shot @trivialfis
print(str(trees_to_remove))
[15, 21, 25, 27, 31, 34, 37, 39, 45, 48, 50, 54, 59, 61, 69, 73, 77, 83, 100, 109, 112, 114]
print(str(trees_to_keep))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 18, 19, 20, 22, 23, 24, 26, 28, 29, 30, 32, 33, 35, 36, 38, 40, 41, 42, 43, 44, 46, 47, 49, 51, 52, 53, 55, 56, 57, 58, 60, 62, 63, 64, 65, 66, 67, 68, 70, 71, 72, 74, 75, 76, 78, 79, 80, 81, 82, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 101, 102, 103, 104, 105, 106, 107, 108, 110, 111, 113, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198]
booster[trees_to_keep]
TypeError: Expecting <class 'int'> or <class 'slice'>. Got <class 'list'>
Turns out I can't really single out trees with slicing as they are in no particular order...
So I read the doc provided and saw that we can do that:
trees = [booster[e] for e in trees_to_keep] trees
[
<xgboost.core.Booster at 0x7efd8ec349d0>,
<xgboost.core.Booster at 0x7efd8ec34070>,
<xgboost.core.Booster at 0x7efd8ec34640>,
<xgboost.core.Booster at 0x7efd8ec34790>,
...
]
But was wondering if there was then a way to recombine these trees
into a single booster object?
combining models is not supported. After slicing the booster, one has to sum the prediction himself with the help of base_margin
. For regression objectives like squarederror where leaf value is the output prediction, tests referred in https://github.com/dmlc/xgboost/issues/8699#issuecomment-1397497819 should provide an example for how to merge the predictions.
@trivialfis any way this could become a feature request? To also be able to index the booster? I am worried that predicting one tree at a time and passing them as base margin to the next one would slow down inference in production?
Feel free to open a new issue. We thought about having a concat
method but dropped the idea as most of the use cases are users feeding predictions from other models (like a linear model or a NN) into XGBoost, instead of joining multiple boosters.
I'm not sure about the inference performance, you can try inplace_predict
and see if it's efficient enough. Since you are deploying the model for inference, there's no need to cache the prediction.
@trivialfis we can close this one as I've opened this one like you requested: https://github.com/dmlc/xgboost/issues/8709
Looking at the predict documentation I saw the iteration_range functionnality. Was wondering if it is possible to pass a list of indices to use in prediction instead of a range. Say for example, I don’t want trees [3, 5, 15, …] to affect the prediction? But I want [1, 2, 4, … ] to be in it…
More details on my use case… There are features I want in training but not at inference. So I’ve put interaction constraints so that these features don’t interact with the other ones I want at inference. Using trees_to_dataframe I’m able to extract a list of indices that used the features I don’t want affecting the predictions at inference. Now I’d like a way to predict without these tree indices…
Tried editing the json of the booster to manually remove them but can’t seem to make it work. Was wondering if there is any other way?
https://discuss.xgboost.ai/t/specify-list-of-iteration-tree-indices-to-use-in-prediction/3033