IQVIA-ML / LightGBM.jl

Julia FFI interface to Microsoft's LightGBM package
Other
93 stars 10 forks source link

Access the best iteration from early stopping #92

Closed bencottier closed 3 years ago

bencottier commented 3 years ago

As far as I can tell, when using early stopping there is no way to directly access the best iteration (according to validation set metrics).

If the early stopping condition is triggered, then eval_metrics!() logs the best iteration. I can then manually input that as num_iterations in predict() or savemodel().

However, it would be useful to access this programmatically, whether or not the early stopping condition is triggered, e.g. when using a single script to train and then predict using the best model.

One way to do this might be to include best_iter in the results returned by train!(). Another could be to make best_iter a field of the LGBMEstimator structs.

yalwan-iqvia commented 3 years ago

Hi @bencottier Thanks for logging this issue!

Just to clarify, are you asking that the exact iteration which has the best validation metric be recorded, early-stopping or otherwise? Another issue this makes me realise is that if early stopping is on, we should probably be truncating the model, but currently we do not.

This is a fair request. I wouldn't want to include it into the estimator struct because these are (mostly) hyperparameter fields, but I am sure we can come up with a solution for this request.

bencottier commented 3 years ago

Hi @yalwan-iqvia - thanks for considering this!

Just to clarify, are you asking that the exact iteration which has the best validation metric be recorded, early-stopping or otherwise?

Yep. The relevance of early stopping is just that it prints out the best iteration, if the early stopping condition is triggered. That's how I noticed the issue.

Another issue this makes me realise is that if early stopping is on, we should probably be truncating the model, but currently we do not.

Yeah, I was surprised that it doesn't truncate the model. But I think it would be best as an option (perhaps truncating by default), rather than forcing it to be truncated.

chilledgeek commented 3 years ago

Hi @bencottier,

A PR is coming soon to address this :)

I will let you know once the PR is submitted so you can either test from my PR branch, or wait for a release with the feature shortly

Thanks for logging this issue and waiting!

chilledgeek commented 3 years ago

Hi @bencottier, the PR has been merged to master :)

We will probably add some more features before making a new release, so if you want to try out the new feature, please go ahead and grab it from master.

Thanks!