dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
https://xgboost.readthedocs.io/en/stable/
Apache License 2.0
26.16k stars 8.71k forks source link

Interaction constraint example inconstant with its actual behavor. #8905

Open ZebinYang opened 1 year ago

ZebinYang commented 1 year ago

Hi,

The example in https://xgboost.readthedocs.io/en/latest/tutorials/feature_interaction_constraint.html#advanced-topic says that given the interaction constraints [0, 1], [1, 3, 4], XGB may generate leaf nodes with splits like [0, 1, 3]. However, after conducting some experiments, I find this may be not true. See the Colab link below for the reproducible experiments. https://colab.research.google.com/drive/1sFBqpM3wMNEcrlAfTF2bQCvSUIrsS_eu?usp=sharing

The source code in https://github.com/dmlc/xgboost/blob/3689695d16c3fc3b160d6917a55cb899932f91d4/src/tree/constraints.cc#L80-L101 also prohibits interactions like [0, 1, 3].

I think the current behavior is preferable, and it would be better to revise the documentation in https://xgboost.readthedocs.io/en/latest/tutorials/feature_interaction_constraint.html#advanced-topic.

hcho3 commented 1 year ago

@trivialfis Can you respond to this question?

trivialfis commented 1 year ago

I will look into ic as a whole after finishing up the existing PRs.