StatMixedML / LightGBMLSS

An extension of LightGBM to probabilistic modelling
https://statmixedml.github.io/LightGBMLSS/
Apache License 2.0
272 stars 28 forks source link

Models with init_score #14

Closed neverfox closed 1 year ago

neverfox commented 1 year ago

The current code uses init_score to inject starting values, but what about modeling problems that need to use init_score to represent an offset, e.g. in insurance, a gamma severity or poisson frequency problem where init_score would be the log(exposure)? Could this be made to get_init_score from the dataset and incorporate that somehow during the process of setting starting values? Or would you be forced to use weights of exposure and change the response variable to y/exposure?

neverfox commented 1 year ago

I think the desired offset would take the place of the np.ones it uses now but you'd have to know how it relates to the distribution parameters through the distributions mean function, right? So for Gamma, you could multiply the desired init_score (offset) by the concentration starting value but you have to invert it for the rate. For Gaussian, you'd multiply it by the starting mean but not the sigma, etc. That would require having the ability to determine the correct operation to perform for each parameter for each distribution. Does that make sense?

StatMixedML commented 1 year ago

Thanks for your interest in the project.

So the initial scores serve as to initialize the boosting model, i.e., the model uses the initial scores as starting values where to boost from for the first itertation. Since we model all parameters of a distribution, LightGBMLSS currently calculates the unconditional parameter values from the data using the LBFGS model, based on the NLL. Hence, there is no need to multiply them with a value since they already reflect a reasonable starting value.

So for a Poisson distributed variable, the rate parameter is initialized with the unconditional parameter values and for the remaining iterations, the rate parameters changes as a function of x.

If you want to model a specific insurance type of data, it is best to transform the response prior to model, i.e., y/exposure as you suggested.

The weights are used to weight the gradient/hessian which are then used to update the parameter estimates.

Does that answer your question?

neverfox commented 1 year ago

In a traditional context, one can either use weights + transformed response or no weights and an offset. In a non-LSS approach where predictions are in terms of the response value rather than distribution parameters, init_score serves as the place bring in an offset. So, for example, I can predict loss severity by using losses as response and log(exposure) as init_score. My prediction will then be severity / exposure. Alternatively, I can use losses/exposure as my response and exposure as weights. That should produce similar models (though not precisely similar in terms of the path training might take).

As I understand you answer, LSS needs to use init_score for distribution parameters, not means, and so I'd have to resort to the losses/exposure + exposure weights approach. The thought in my initial response was basically thinking that if I was really committed to using my domain offset, I'd just need to relate it to parameter init_scores in the proper way. For example, in Gamma the mean is alpha/beta. So if the init_scores were alpha*log(exposure) and beta/log(exposure), then I'd get the parameter predictions for Gammas that were already in terms of a loss/exposure distribution rather than a raw loss distribution without having to transform my response.

Transforming the response and using weights isn't a big deal, but there are cases were you really would like to use an offset in a regression, like if I want to build a model off the prior model's predictions (even if they are distributional parameters) or if I want to start from (to use insurance again) my current rating plan's relativities.

So the thought behind the ticket is really how one might be able to take any init_score you might have for your problem space that you planned to use in a traditional LGBM context (just like you have a response variable that itself doesn't start out as a set of distribution parameters) and have it work in the same manner, transparently, with the LSS version of the model.

StatMixedML commented 1 year ago

@neverfox Thanks for the detailed explanation! I now better understand your problem/use-case.

I need to see how to incorporate an offset without transforming the response. Not sure if this is possible via init_score since it is currently being used as the starting-values for the distributional parameters.

StatMixedML commented 1 year ago

@neverfox I looked into it, however, I don't see how I can possibly change the start values to a transformation of the response. So for now, the onl way to do it would be to transform the response accordingly.

StatMixedML commented 1 year ago

Can I close this?

neverfox commented 1 year ago

You can. I am still thinking the issue through but it might take me some time.

StatMixedML commented 1 year ago

Feel free to re-open if necessary. Thanks.