ATOMScience-org / AMPL

The ATOM Modeling PipeLine (AMPL) is an open-source, modular, extensible software pipeline for building and sharing models to advance in silico drug discovery.
MIT License
136 stars 68 forks source link

Support early stopping in xgboost models #315

Open mcloughlin2 opened 5 months ago

mcloughlin2 commented 5 months ago

When we plot predicted vs actual values for an xgboost model, the model often appears to be overfit to the training set. This could probably be alleviated by training for fewer generations, just as it does for NN models. The current version of the xgboost package allows you to specify a validation dataset and metric and an early stopping rounds parameter, so that you can stop iterations when the metric value for the validation set doesn't improve after early_stopping_rounds iterations.