Open mayer79 opened 4 years ago
Maybe the max_delta_step
has something to do with the convergence. I'm not sure if it's like the proof for softmax
function.
In another example with different data and parameters, the result is not too different from LightGBM's rf mode, but ironically worse than using ranger with MSE loss. Poisson deviance should be low, r_squared is the proportion of deviance explained (should be high). These values are calculated on an independent validation data set. So in this particular second example, there does not seem a problem. But what could be the reason for the results in the first example?
# pred_xgb pred_lgb pred_ranger
# deviance 1.05753011 1.04356883 1.02037717
# r_squared 0.04407729 0.05669717 0.07766058
The code is in https://github.com/mayer79/random_forest_benchmark/blob/master/r/poisson.R
With slightly other data structure but otherwise quite similar parameters, I get again bad results (an R-squared of -165% on the validation data). Training stops very fast compared to lightGBM (3 instead of 10 seconds). This is with the third commit in above link.
# pred_xgb pred_lgb pred_ranger
# deviance 3.303889 1.17156879 1.13913418
# r_squared -1.655372 0.05839744 0.08446549
Hmm.
In random forest mode we can only perform one Newton step to minimise the objective function. In the case of squared error a single step is sufficient. This might be the source of bias. I have thought about how to fix this, maybe by refreshing the trees in subsequent boosting steps.
This might be indeed the reason why performance is not too good. But I guess lightGBM's rf mode would suffer the same issue. However, its performance is not negative in all cases I have tested. I will try to test more delta_step values to see if it is just a poor parametrization by myself.
With MSE loss, the random forest mode seems to work well. However, when switching to "count:poisson" (and also "reg:gamma") loss, the model is completely off. The distribution of the predictions is heavily biased.
In this example, the R-squared (with respect to MSE as well as Poisson loss) drops from 70%-80% to 0%.
max_delta_step
has a large impact.