dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
https://xgboost.readthedocs.io/en/stable/
Apache License 2.0
26.18k stars 8.71k forks source link

[Feature Proposal]: Flexibility to set custom Min Child Weight Values for each feature #10670

Open kiran-vj opened 2 months ago

kiran-vj commented 2 months ago

In practical modelling scenarios there are often some very key variables which are very sparsely populated- which forces the modelers to set lower min_child_weight values to ensure these variables are incorporated in the model, but this can often lead to overfitting on other variables.

To avoid such scenarios what we propose is having the flexibility to set different min_child_weight values for each feature.

trivialfis commented 2 months ago

Out of curiosity, how do you tune such models? I imagine the HPO search space would be extremely large?

kiran-vj commented 1 month ago

You're right, the HPO search space would be large. But we can approach it as a 2-step process. In the first step the HPO search would be limited to Uniform MCW for all the features, and in the second step we try to tune just the MCW parameter only for the problematic features identified by the modellers.

@trivialfis We wish to contribute this feature by helping in the development. What would the process look like for getting this approved and merged into XGBoost?

kiran-vj commented 1 month ago

Any thoughts on this?

trivialfis commented 1 month ago

Apologies for the slow reply. It's not a trivial change. You can find the parameter definition here: https://github.com/dmlc/xgboost/blob/cb54374550002efa7e4f2279c8941b4c7c196188/src/tree/param.h#L25 If you turn it into a vector, it can be parsed as JSON similar to https://github.com/dmlc/xgboost/blob/cb54374550002efa7e4f2279c8941b4c7c196188/src/tree/param.cc#L85 By searching the parameter name, you can find where it's used to prevent split. The split candidate has split feature index. I'm not entirely sure about the GPU implementation yet.