Open lukezli opened 5 months ago
Hi, the number of boosting rounds needs to be set for refresh
to match the number of boosted rounds in the existing model.
Thanks! Can you explain the effect of eta in the context of refresh?
My base model is trained on 1 year of data. I want to add 1 additional week of data (for so-called incremental learning), but I noticed that, for my regression task:
using updater:refresh, and num_boost_rounds matching makes my predictions go very close to zero as eta -> 0, whereas I was under the impression that as eta -> 0 my refreshed model should more closely match the un-refreshed model (instead it seems to behave as if the only data that matters is the new set of data). is there any way to perform incremental learning such that the bulk of my original data is still kept in the model and I only make small updates to the leaf weights based on the new data?
I expect the above code to output a model with 100 trees in both cases. However, running this python script instead gives me:
model size before refresh 100 model size after refresh 10
Which is unexpected. This is on xgboost 2.0.3.
Am I doing something wrong / misunderstanding how the refresh parameter should work?