Closed hayato-n closed 5 months ago
Thanks a lot for using GPBoost and for reporting this issue!
There are two offsets: one for training and one for prediction. I.e., one that is added to the linear predictor during estimation and one that is added to the linear predictor for prediction. The current implementation for handling the offset
for training is such that this needs to be passed again when calling the predict
function (in addition to a potential offset_pred
for the prediction points). This is arguably not very user-friendly. I thought that "nobody" was going to use this feature anyway. Turns out I was wrong.
I have now fixed this that the offset
(that was passed during estimation) is saved internally. It is on GitHub but not yet on PyPI / CRAN. What you can do in the meantime is just passing the offset
again when you call the predict
function:
pred2 = model.predict(
X_pred=X, group_data_pred=np.ones(n, dtype=int),
offset=np.log(exposure), offset_pred=np.log(exposure)
)
I hope this helps.
Thank you for your quick response, @fabsig ! The specification you showed worked.
offset term is really useful for social observation data, such as economic data or official statistics. It is because the exposures toward risks are not uniform in usual. For example, when we use burglary count data, the number of houses are not uniform. Hence we want to control it with offset term. Thus, I was really grad to find that this package supports offset term.
Glad to hear that it worked. And thanks for the explanation about the use of the offset term!
Hi, I tried to estimate Poisson GLMM with an offset term, and I suspect that the offset term may not work in estimation.
I conducted an easy simulation as follows with gpboost=1.5.0.1:
I expected that
pred2
would become an appropriate prediction because its offset term is specified in the same manner asmodel.fit
. However,pred1
andpred2 / exposure
provided accurate predictions.Do I misunderstand how I specify the offset term?