xval example code is incorrect

deepset-ai / FARM

:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.

https://farm.deepset.ai

Apache License 2.0

1.73k stars 247 forks source link

xval example code is incorrect #812

Closed johann-petrak closed 3 years ago

johann-petrak commented 3 years ago

The current example program ./examples/doc_classification_crossvalidation.py is calculating the overal XVAL estimates incorrectly. For each metric, it should really calculate the mean and stdev over all folds instead of calculating the metric over all fold data. In most cases, the difference is probably tiny, but the way this is calculated now is definitely wrong.

johann-petrak commented 3 years ago

Metrics logged for the current implementation (doc_classification_crossvalidation.py example, single run, seed 42):

XVAL acc:        0.802130898021309
XVAL F1 MICRO:   0.802130898021309
XVAL F1 MACRO:   0.7818362975029298
XVAL F1 OFFENSE: 0.7152964959568734
XVAL F1 OTHER:   0.8483760990489861
XVAL MCC:        0.5642905675219807

johann-petrak commented 3 years ago

Same after new implementation:

XVAL Accuracy:   mean 0.8021317093826182 stdev 0.010111693788346669
XVAL F1 MICRO:   mean 0.8021317093826182 stdev 0.010111693788346709
XVAL F1 MACRO:   mean 0.7816765438006879 stdev 0.008313385095785861
XVAL F1 OFFENSE: mean 0.7151650412974621 stdev 0.010665611398946019
XVAL F1 OTHER:   mean 0.8481880463039135 stdev 0.010688811556707749
XVAL MCC:        mean 0.5656135088223573 stdev 0.017285658033477026

Timoeller commented 3 years ago

fixed by #825