Closed alexandres closed 4 years ago
Hi @alexandres that is strange - your code looks good.
Could you try going back to Flair version 0.2. and run the experiment again with the instructions in the 0.2. EXPERIMENTS.md?
Thanks a bunch @alanakbik . That did it!
On latest (pip install flair
) release, scores at the end of training:
2019-02-27 23:49:38,087 loading file resources/taggers/pos-extvec/best-model.pt
2019-02-27 23:50:44,474 MICRO_AVG: acc 0.9358 - f1-score 0.9668
2019-02-27 23:50:44,475 MACRO_AVG: acc 0.876 - f1-score 0.9218586956521738
On v0.2.0, after a single epoch:
0 (11:45:56) 11.641517 0 0.100000 DEV 7082 0.9462540222208731 TEST 7095 0.9452774307001712
So 0.9358 for 150 epochs vs 0.94625 for 1 epoch.
Thanks! You saved me a lot of time.
Cool, thanks for checking this out!
For us, this means we have to take a closer look what changed between the versions. Generally, quality should get better with newer versions not the other way around :) We'll take a look!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Hi,
First thank you for the great work on this library! :)
I'm trying to replicate the "Classic Word Embedding + BiLSTM-CRF" result from http://aclweb.org/anthology/C18-1139 on the PTB POS dataset of 96.94 ± 0.02 accuracy.
I followed the instructions at https://github.com/zalandoresearch/flair/blob/master/resources/docs/EXPERIMENTS.md#penn-treebank-part-of-speech-tagging-english
My code and corpus statistics are available at https://gist.github.com/alexandres/a54506e31d038cce75f31d09c60c9df8
My corpus statistics exactly match those from https://nlp.stanford.edu/pubs/CICLing2011-manning-tagging.pdf , namely:
Unfortunately my POS accuracy is around 94% with the "Classic Word Embedding + BiLSTM-CRF" using the Komninos embeddings.
Any idea what I'm doing wrong?
Note: I notice that the embeddings are not fine-tuned during training. There is no mention of this in the paper. Perhaps this is the cause?
Thanks!