Closed mglowacki100 closed 7 months ago
Thanks a lot for your interest in GPBoost!
If you use grouped random effects (with the group_data
argument), predictions of new groups / categories (weeks in your cases) will ignore the distance to existing ones. In this case, you can use a Gaussian process (GP) instead of grouped random effects. This will take the distance into account (like time series models, in fact a GP with an exponential covariance function corresponds to an AR(1) model). You can do this by passing the week variable to the gp_coords
argument instead of group_data
. Note that if the number of data points is large, you need to use an approximation for faster computations such as gp_approx = "vecchia"
.
This blog post might also be helpful for you.
Thanks a lot :)
Very interesting package. I've use-case for it but I'm not sure if GPBoost is good fit for it. So, here is short description of my problem:
GPBoost takes
group_data
parameter and inital results are close to xgboost, but my question is: does it take grouping into account? Let's say in train set I've weeks: 1,2,3,4, ..., 1000 , in test set: 1005, 1006, ..., 1060 (small gap betewen train and test to avoid target leak), if 'week' it is treated as ordinary categorical then values 1005, 1006, ..., 1060 are interpreted as unknown at prediction time - like in ordinry GBDT. Is there a way to add into GPBoost 'inductive bias' that weeks 4 and 5 are much closer than e.g. 4 and 500 ?