Open Janluke0 opened 2 years ago
https://github.com/Janluke0/PoS-Tagging/commit/bbd0d53bd3e975f5ab68784292c2dc10e86633ee This error has result in the inclusion of the PAD token in the loss computation.
Accuracy was not affected.
A re-run is required
After the bugfix
name | GRU | LSTM64 | SA | LSTM128 | TRANS | SA2 |
---|---|---|---|---|---|---|
BPE | # | # | 83.38 | 92.82 | 68.67 | 87.90 |
WordPiece | # | # | 83.63 | 90.62 | 69.66 | 88.04 |
BERT_pretrained | # | # | 86.34 | 90.35 | 61.22 | 88.61 |
DBERT_pretrained | # | # | 84.34 | 90.62 | 60.04 | 86.81 |
ELECTRA_pretrained | # | # | 85.97 | 90.51 | 58.18 | 88.28 |
ROBERTA_pretrained | # | # | 87.11 | 91.37 | 59.01 | 89.19 |
name | GRU | LSTM64 | SA | LSTM128 | TRANS | SA2 |
---|---|---|---|---|---|---|
BPE | # | # | 2000 | 2000 | ~1750 | ~1000 |
WordPiece | # | # | ~450 | ~450 | ~800 | ~1400 |
BERT_pretrained | # | # | ~500 | ~1750 | ~500 | ~500 |
DBERT_pretrained | # | # | ~500 | ~1250 | ~500 | ~120 |
ELECTRA_pretrained | # | # | ~600 | 2000 | ~500 | ~650 |
ROBERTA_pretrained | # | # | 2000 | 2000 | ~500 | ~1100 |
Tokenizers
Models
IO
x_n
is the n-th token of the sententecey_n
is the n-th pos tag, different from the special tokenExplicit pad
only ifx_n
is the first token of the wordThis last choice requires a pretokenization with the same criterions used to build the used dataset, but will allow an easy combination of different tokenizer.
The only-first-token approch didn't change the performace significatively in preliminary experiments.
LSTM/GRU
LSTM and GRU models share the same bidirectional 2 level architecture architecture.
Self Attention
TransformerEncoder part only. Key masking is applied for masking out pads.
Transformer (Token2Tag)
Full transformer architecture trained directly on the final task, in this case explicit padding wasn't used being it able to produce an output of different length.
But the requirement of pretokenizzation is stil required to link word and tag
Models on tokenizer performance
Accurrancy is computed ignoring BOS/EOS and padding labels
Max accuracy
The max # of epochs is 2000 The earlystopping condition was put on the validation accuracy, for a improvement under 10^-7
epochs