The ATOM Modeling PipeLine (AMPL) is an open-source, modular, extensible software pipeline for building and sharing models to advance in silico drug discovery.
When we plot predicted vs actual values for an xgboost model, the model often appears to be overfit to the training set. This could probably be alleviated by training for fewer generations, just as it does for NN models. The current version of the xgboost package allows you to specify a validation dataset and metric and an early stopping rounds parameter, so that you can stop iterations when the metric value for the validation set doesn't improve after early_stopping_rounds iterations.
When we plot predicted vs actual values for an xgboost model, the model often appears to be overfit to the training set. This could probably be alleviated by training for fewer generations, just as it does for NN models. The current version of the xgboost package allows you to specify a validation dataset and metric and an early stopping rounds parameter, so that you can stop iterations when the metric value for the validation set doesn't improve after early_stopping_rounds iterations.