Closed bencottier closed 3 years ago
Hi @bencottier Thanks for logging this issue!
Just to clarify, are you asking that the exact iteration which has the best validation metric be recorded, early-stopping or otherwise? Another issue this makes me realise is that if early stopping is on, we should probably be truncating the model, but currently we do not.
This is a fair request. I wouldn't want to include it into the estimator struct because these are (mostly) hyperparameter fields, but I am sure we can come up with a solution for this request.
Hi @yalwan-iqvia - thanks for considering this!
Just to clarify, are you asking that the exact iteration which has the best validation metric be recorded, early-stopping or otherwise?
Yep. The relevance of early stopping is just that it prints out the best iteration, if the early stopping condition is triggered. That's how I noticed the issue.
Another issue this makes me realise is that if early stopping is on, we should probably be truncating the model, but currently we do not.
Yeah, I was surprised that it doesn't truncate the model. But I think it would be best as an option (perhaps truncating by default), rather than forcing it to be truncated.
Hi @bencottier,
A PR is coming soon to address this :)
I will let you know once the PR is submitted so you can either test from my PR branch, or wait for a release with the feature shortly
Thanks for logging this issue and waiting!
Hi @bencottier, the PR has been merged to master :)
We will probably add some more features before making a new release, so if you want to try out the new feature, please go ahead and grab it from master.
Thanks!
As far as I can tell, when using early stopping there is no way to directly access the best iteration (according to validation set metrics).
If the early stopping condition is triggered, then
eval_metrics!()
logs the best iteration. I can then manually input that asnum_iterations
inpredict()
orsavemodel()
.However, it would be useful to access this programmatically, whether or not the early stopping condition is triggered, e.g. when using a single script to train and then predict using the best model.
One way to do this might be to include
best_iter
in theresults
returned bytrain!()
. Another could be to makebest_iter
a field of theLGBMEstimator
structs.