Open carschno opened 4 months ago
This may also include #4 and #3
In the context of #23, an initial experiment run on Renate Analysis with SentenceTransformers:
Metric BEGIN IN END OUT
MulticlassPrecision 0.9091 1.0000 0.9524 0.8333
MulticlassRecall 0.9524 0.9941 0.9524 1.0000
MulticlassF1Score 0.9302 0.9970 0.9524 0.9091
MulticlassF1Score (micro average): 0.9896
With Gysbert-v2, the outputs seem random. Evaluation results:
Metric BEGIN IN END OUT
MulticlassPrecision 0.0909 0.0000 0.1098 0.8824
MulticlassRecall 0.9048 0.0000 1.0000 0.7143
MulticlassF1Score 0.1652 0.0000 0.1979 0.7895
MulticlassF1Score (micro average): 0.1328
Results can now be logged to WandB (#22).
This is a running issue for collecting experiments that should be run.
[ ] Model comparison: compare different BERT and SentenceTransformer models (depends on #23)[ ] Balance training set between three sources (RenateAnalysisInv, RenateAnalysis, GeneraleMissive) in terms of pages[ ] Different batch sizes