Consistency of min_child_weight parameter

RAMitchell commented 4 years ago

The min_child_weight parameter (default value 1.0) has different effects based on scaling of objective functions. I noticed this when developing a new objective function that had a small Hessian and the tree was not able to grow with default parameters. Objectives like squared error and logistic loss will be regularised very differently as a consequence. For example using logistic loss where the hessian values can be much smaller, it can require a much larger number of training instances to split. In #2483 it is noted that the hessian in the case of logistic loss is proportional to variance, however this is not true of other objectives in general.

This is relevant to the task of finding good default parameters across a range of objectives (#4986).

One obvious solution is normalising all objective functions in some consistent way.

Another solution is deprecating min_child_weight and moving to a parameter like min_child_instances, regularising based on the amount of training data without respect to the objective function.

@trivialfis has also proposed implementing multiclass objective functions via vector leaves, if we do this the hessian will be a vector and it is not obvious how to correctly apply min_child_weight.

thvasilo commented 4 years ago

This is a good point. LightGBM uses a default of 1e-3 for comparison, with a min of 20 data points per leaf.

QuantHao commented 4 years ago

I think one problem of replace min_child_weight with min_child_instances is that: how to deal with sample weight? A good questions is raised and answered at a LightGBM issue.

dmlc / xgboost

Consistency of min_child_weight parameter #5444