Experiments - Githubissues

carschno commented 4 months ago

This is a running issue for collecting experiments that should be run.

[X] Minimum region length for short region filtering
~~[ ] Model comparison: compare different BERT and SentenceTransformer models (depends on #23)~~
~~[ ] Balance training set between three sources (RenateAnalysisInv, RenateAnalysis, GeneraleMissive) in terms of pages~~
[x] Balance training set between three sources (RenateAnalysisInv, RenateAnalysis, GeneraleMissive) in terms of inventories
[x] Hyper parameter optimization: compare different layer sizes and numbers of layers
[x] Joint model vs. specific models: compare models trained on combined datasets vs. individual models
[ ] Simple page embedding: use only first n tokens in page
[ ] Omit layer type embeddings from regions model
[ ] Add coordinate embeddings to regions model
~~[ ] Different batch sizes~~
[X] Backpropagation to (Sentence)Transformer model

carschno commented 4 months ago

This may also include #4 and #3

carschno commented 4 months ago

In the context of #23, an initial experiment run on Renate Analysis with SentenceTransformers:

Metric  BEGIN   IN  END OUT
MulticlassPrecision 0.9091  1.0000  0.9524  0.8333
MulticlassRecall    0.9524  0.9941  0.9524  1.0000
MulticlassF1Score   0.9302  0.9970  0.9524  0.9091
MulticlassF1Score (micro average):  0.9896

With Gysbert-v2, the outputs seem random. Evaluation results:

Metric  BEGIN   IN  END OUT
MulticlassPrecision 0.0909  0.0000  0.1098  0.8824
MulticlassRecall    0.9048  0.0000  1.0000  0.7143
MulticlassF1Score   0.1652  0.0000  0.1979  0.7895
MulticlassF1Score (micro average):  0.1328

carschno commented 3 months ago

Results can now be logged to WandB (#22).

LAHTeR / document_segmentation

Experiments #34