gchrupala / morfette

Supervised learning of morphology
BSD 2-Clause "Simplified" License
28 stars 5 forks source link

Results on unknown words much worse as compared with revision 26 #4

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Results on unknown words worse as compared with revision 26 (on French
confirmed on French FTB).

r26:
./bin/morfette eval data/fr/ftb_1.pos.utf8.morpheteready
data/fr/ftb_3.pos.utf8.morpheteready
data/fr/ftb_3.pos.utf8.morpheteready.predict.2
Unseen word ratio: 5.82
Token accuracy all:
Lemma acc: 97.86% (35945 / 36732)
POS   acc: 97.85% (35944 / 36732)
Joint acc: 96.37% (35399 / 36732)

Token accuracy seen:
Lemma acc: 98.24% (33987 / 34595)
POS   acc: 98.22% (33980 / 34595)
Joint acc: 97.00% (33557 / 34595)

Token accuracy unseen:
Lemma acc: 91.62% (1958 / 2137)
POS   acc: 91.90% (1964 / 2137)
Joint acc: 86.20% (1842 / 2137)

Release 0.3:
./bin/morfette eval ftb_1.pos.utf8.morpheteready
ftb_2.pos.utf8.morpheteready ftb_2.pos.utf8.morpheteready.predict 
Unseen word ratio: 5.48
Token accuracy all:
Lemma acc: 97.21% (34083 / 35062)
POS   acc: 97.56% (34205 / 35062)
Joint acc: 95.73% (33564 / 35062)

Token accuracy seen:
Lemma acc: 98.17% (32532 / 33139)
POS   acc: 98.24% (32556 / 33139)
Joint acc: 96.99% (32143 / 33139)

Token accuracy unseen:
Lemma acc: 80.66% (1551 / 1923)
POS   acc: 85.75% (1649 / 1923)
Joint acc: 73.89% (1421 / 1923)

Original issue reported on code.google.com by pitekus on 31 Jan 2010 at 11:56

GoogleCodeExporter commented 9 years ago
Pasted results with wrong testfile for r26. 
This is what it was:
./bin/morfette eval data/fr/ftb_1.pos.utf8.morpheteready
data/fr/ftb_2.pos.utf8.morpheteready 
data/fr/ftb_2.pos.utf8.morpheteready.predict.2
Token accuracy all:
Lemma acc: 97.57% (34209 / 35062)
POS   acc: 97.87% (34316 / 35062)
Joint acc: 96.18% (33722 / 35062)

Token accuracy seen:
Lemma acc: 97.92% (32449 / 33139)
POS   acc: 98.25% (32560 / 33139)
Joint acc: 96.76% (32066 / 33139)

Token accuracy unseen:
Lemma acc: 91.52% (1760 / 1923)
POS   acc: 91.32% (1756 / 1923)
Joint acc: 86.12% (1656 / 1923)

Original comment by pitekus on 1 Feb 2010 at 12:13

GoogleCodeExporter commented 9 years ago
Removing dictionary filtering seems to have done it:
gchrupala@cl6lx:~/experiments/morfette/tmp/morfette-0.3> ./bin/morfette eval
ftb_1.pos.utf8.morpheteready ftb_2.pos.utf8.morpheteready
ftb_2.pos.utf8.morpheteready.predict
Unseen word ratio: 5.48
Token accuracy all:
Lemma acc: 97.80% (34291 / 35062)
POS   acc: 97.96% (34345 / 35062)
Joint acc: 96.47% (33826 / 35062)

Token accuracy seen:
Lemma acc: 98.14% (32522 / 33139)
POS   acc: 98.31% (32580 / 33139)
Joint acc: 97.03% (32154 / 33139)

Token accuracy unseen:
Lemma acc: 91.99% (1769 / 1923)
POS   acc: 91.78% (1765 / 1923)
Joint acc: 86.95% (1672 / 1923)

Original comment by pitekus on 1 Feb 2010 at 12:37