Best Model - Githubissues

[ ] Consider Results from all our experiments, choose the changes with significant improvement, and train the best model.
[ ] Metriken im Report: Patience zurück n-10 (early stopping), dann das reporten. Herausfinden wie typischerweise Metriken im Zusammenhang mit Early Stopping und Cross Validation angegeben werden (nicht alle Folds werden gleich lang trainieren) When using cross-validation with early stopping, reporting metrics can be a bit tricky due to the varying number of epochs across folds. However, there are some common approaches to handle this:

Perplexity's answer:

Average performance metrics: Calculate and report the mean and standard deviation of the final performance metrics across all folds[1][2]. This gives an overall picture of model performance, accounting for variations in training duration.
Best epoch statistics: Report the average and standard deviation of the best epoch (where early stopping occurred) across folds[1]. This provides insight into when the model typically reaches peak performance.
Learning curves: Plot the average learning curve (e.g., loss vs. epochs) across all folds, with error bands showing the standard deviation[2]. This visualizes the overall training trajectory and variability between folds.
Final model performance: After cross-validation, train a final model on the entire dataset using the average best number of epochs determined during cross-validation[1][2]. Report this model's performance on a held-out test set.
Ensemble approach: Some researchers create an ensemble by combining the models from each fold, then evaluate this ensemble on a held-out test set[2].
Detailed fold reporting: For a comprehensive view, report individual fold performances along with the number of epochs for each fold[4].
Cross-validated metrics: Use the combined holdout predictions from all folds to compute overall cross-validated performance metrics[4].

When using these approaches, it's important to clearly state the method used for reporting, as different methods can lead to slightly different results. The choice of method often depends on the specific requirements of the project and the audience for the results.

It's worth noting that while early stopping with cross-validation can be powerful, it also introduces some complexity. Some researchers prefer to use cross-validation for hyperparameter tuning and model selection, then apply early stopping only when training the final model on the full dataset[3]. This approach can simplify the reporting process while still leveraging the benefits of both techniques.

Citations: [1] https://datascience.stackexchange.com/questions/74351/what-is-the-proper-way-to-use-early-stopping-with-cross-validation [2] https://stackoverflow.com/questions/52788635/early-stopping-with-a-cross-validated-metric-in-keras [3] https://machinelearningmastery.com/how-to-stop-training-deep-neural-networks-at-the-right-time-using-early-stopping/ [4] https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/gbm-faq/cross_validation.html [5] https://xgboosting.com/early-stopping/

Satellite-Based-Crop-Classification / messis

Best Model #71