deepset-ai / FARM

:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.
https://farm.deepset.ai
Apache License 2.0
1.73k stars 247 forks source link

Implement holdout evaluation in addition to cross validation #811

Closed johann-petrak closed 3 years ago

johann-petrak commented 3 years ago

Basically the same as we have now for cross validation, implement DataSiloForHoldout with class method make(cls, datasilo, sets=["train", "dev", "test"], n_splits=5, shuffle=True, train_split=0.7, stratified=True)

The advantage of holdout estimatation is that the training and test sizes do not depend on the number of splits and it is possible in some situations to get generalization estimates more efficiently.

johann-petrak commented 3 years ago

Holdout estimation, with doc_classification_holdout.py example script, 5-fold, stratified, 0.8:

HOLDOUT Accuracy:   mean 0.9260386190754827 stdev 0.02326096390235921
HOLDOUT F1 MICRO:   mean 0.9260386190754827 stdev 0.02326096390235921
HOLDOUT F1 MACRO:   mean 0.9186748352877108 stdev 0.0245897094179357
HOLDOUT F1 OFFENSE: mean 0.8942415341996025 stdev 0.030496627391813518
HOLDOUT F1 OTHER:   mean 0.943108136375819 stdev 0.018691519027049935
HOLDOUT MCC:        mean 0.8386780795230324 stdev 0.047554615774727346

This is quite a bit better than when running the crossvalidation sample. Maybe because we also do dev-set stratification here?

Timoeller commented 3 years ago

fixed by #825