Closed AhmetZamanis closed 11 months ago
Thanks a lot for using GPBoost!
Could you please provide a complete reproducible example (code + data, e.g,, simulated data) in which your described behavior is observed?
Thanks for getting back.
It may not be possible to share the data, but I'll check again. If not, I'll see if I can replicate the behavior with a similar dataset in the same environment.
I performed another test on a different panel dataset, with the same environment. GPBoost
and Booster
worked as expected (and likely better than fixed effects LightGBM
), with different hyperparameter tunes training for different numbers of rounds & strongly impacting performance. I think this shows there is no software bug, but just a peculiar modeling outcome with the delivery duration dataset.
I'll share my code & the delivery duration data source below if you'd like to review my experiments. Please feel free to close the issue if you wish, as the package seems to be working as expected.
The "troubleshooting" branch of this repository has all the code & experiments I mentioned as Jupyter notebooks.
requirements.txt
.1_DataPrep
will repeat my data prep & feature engineering steps and save the .csv file for the modeling notebooks. You can just run the entire notebook once.GPBoost
models can take a while, but there's no need to do more than a few dozen trials to understand if the models are learning & improving. LightGBM
tuning takes a few minutes at most with the GPU option.The dataset is from an example take-home project on StrataScratch, available for download if you create a free account. I'd rather not share it directly as I'm not sure StrataScratch allows that.
Thanks again for your time.
OK, thanks for letting me know. I currently don't have time to reproduce / review this example myself. But good to know that everything is running as expected.
Hi, thanks for developing this interesting package & modeling approach.
I've been recently performing a regression modeling exercise in Python, where I compare the performance of standard tree boosting algorithms with various
GPBoost
configurations, and with mixed linear models fitted usingGPModel
.I observed strange behavior in the performance & training of the
Booster
+ random intercept models. To summarize, the fixed effect component always predicts the response variable mean, and theBooster
doesn't train past a few rounds in any of the hyperparameter configurations I tried (and they all yield practically the same validation scores). I am wondering if theBooster
fails to learn from the data either due to a bug, or a quirk in the modeling. The issue does not seem to exist with just aBooster
model by itself, without random effects, which trains for 90+ rounds and outputs considerably better & varying predictions.I'll summarize my dataset & experiments below. I have checked my code extensively using the examples in the documentation, and I don't think I am making a mistake with the
GPBoost
syntax. I can still share the full code if necessary, which is in Jupyter notebook format along with the outputs, but I don't think I'm allowed to share the dataset, so it may not be possible to run & reproduce it.Dataset
The data consists of 100k+ rows. Each row represents an order delivery and various information about it. The goal is to predict the delivery durations using various order attributes. After some feature engineering I have close to 30 predictors.
store_id
, is grouping variable: It records the unique ID of the store that fulfilled the order. There are 5000+ stores in the dataset, and each one has anywhere from hundreds to a single delivery (one delivery = one observation). The main goal of my experiment is to modelstore_id
first as a fixed effect predictor with target encoding, then with a random effect intercept for each store, and compare the performances of various models.Models / experiments
Below are the model configurations I tried, and some notes about their outputs.
LGBM
: Standard fixed effects LightGBM, withstore_id
as a target encoded predictor. Trained directly with packagelightgbm
.RMSE 909, MAPE 22.5%
.store_id
is the top predictor.LM
: Fixed effects linear regression, withstore_id
as a target encoded predictor. Trained withscikit-learn
'sLinearRegression
.RMSE 950, MAPE 24.5%
.GPB1
:GPBoost
model withBooster
+ a random intercept forstore_id
.RMSE 1068, MAPE 27.6%
.store_id
(usingGPModel
) yields virtually the same predictions & testing scores.LMM
: Mixed effects linear regression, random intercept forstore_id
, trained withGPModel
.RMSE 933, MAPE 24%
.I also tried the following experiments for troubleshooting:
LGBM
withstore_id
completely dropped from the model. The performance suffers slightly (RMSE 926, MAPE 23%
) , but the model still trains for 90+ rounds, outputs varying predictions, and the predictors make significant contributions to SHAP. This confirms there is still considerable signal to be captured withoutstore_id
.Booster
only fixed effects model with GPBoost, withstore_id
completely dropped from the model. Very similar results as the previous line. This showsBooster
works as expected by itself.Booster
+ random effect model, but with a randomly generated grouping variable with 100 levels, and withstore_id
as a fixed effect predictor. The random effect predictions are close to zero as expected, but the fixed effect predictions are still constant at the response mean. All hyperparameter configurations train for the maximum of 5000 rounds with virtually no difference in validation scores.All of this suggests to me that somehow the
Booster
component ofGPB1
is not working properly, or the presence of a random effect component is somehow preventing theBooster
to learn the fixed effects from the data. The random effect forstore_id
captures a lot of the variance, and the remaining fixed predictors are not very predictive, but they still have considerable contributions in other models, which all perform considerably better thanGPB1
even withoutstore_id
.I am curious if this is expected behavior, and if there's a modeling-related reason for it. If it sounds like a genuine software bug, please let me know and I can provide technical details. Thanks in advance for your time.