Rationale for rounding during _parallel_binning_fit and _grow

cerlymarco / linear-tree

A python library to build Model Trees with Linear Models at the leaves.

MIT License

338 stars 54 forks source link

Rationale for rounding during _parallel_binning_fit and _grow #32

Closed session-id closed 1 year ago

session-id commented 1 year ago

I noticed that the implementations of _parallel_binning_fit and _grow internally round loss values to 5 decimal places. This makes the regression results dependent on the scale of the labels, as data with a lower natural loss value will result in many different splits of the data having the same loss when rounded to 5 decimal places. Is there a reason why this is the case?

This behavior can be observed by fitting a LinearTreeRegressor using the default loss function and multiplying the scale of the labels by a small number (like 1e-9). This will result in the regressor no longer learning any splits.

cerlymarco commented 1 year ago

Hi, I found a lot of numerical instability in splitting evaluations for extremely small numbers. That's the reason for this choice. If you have a better idea of how to handle the problem please let me know or don't hesitate to contribute

All the best

session-id commented 1 year ago

One source of potential instability that I can see may come from the fact that most operations are done using float32 after the inputs are coerced into float32. This doesn't seem too problematic for the purposes of applying the model at a small number of features, but the computations for the different loss functions also operate in float32, allowing roundoff errors to accumulate as summation is done across the sample dimension. There are probably speed advantages to doing most of the training in float32, but perhaps changing the loss computation to use float64 might help with that numerical instability.

Alternatively, what would you think about allowing the user to specify the precision used to train the model, in addition to whether rounding is performed?