DohaBasem / mate-tools

Automatically exported from code.google.com/p/mate-tools
1 stars 0 forks source link

mixing up LEMMA and PLEMMA columns #1

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. take a sentence, e.g. "Quick brown fox jumps over the lazy dog ." 
2. split, lemmatize, tag and parse it
3. have a look at the intermediate results

Here's what I get:

1   Quick   _   _   _   _   _   _   _   _   _   _   _   _   _
2   brown   _   _   _   _   _   _   _   _   _   _   _   _   _
3   fox _   _   _   _   _   _   _   _   _   _   _   _   _
4   jumps   _   _   _   _   _   _   _   _   _   _   _   _   _
5   over    _   _   _   _   _   _   _   _   _   _   _   _   _
6   the _   _   _   _   _   _   _   _   _   _   _   _   _
7   lazy    _   _   _   _   _   _   _   _   _   _   _   _   _
8   dog _   _   _   _   _   _   _   _   _   _   _   _   _
9   .   _   _   _   _   _   _   _   _   _   _   _   _   _

1   Quick   _   quick   _   _   _   _   -1  _   _   _   _   _
2   brown   _   brown   _   _   _   _   -1  _   _   _   _   _
3   fox _   fox _   _   _   _   -1  _   _   _   _   _
4   jumps   _   jump    _   _   _   _   -1  _   _   _   _   _
5   over    _   over    _   _   _   _   -1  _   _   _   _   _
6   the _   the _   _   _   _   -1  _   _   _   _   _
7   lazy    _   lazy    _   _   _   _   -1  _   _   _   _   _
8   dog _   dog _   _   _   _   -1  _   _   _   _   _
9   .   _   .   _   _   _   _   -1  _   _   _   _   _

1   Quick   quick   _   _   JJ  _   _   -1  _   _   _   _   _
2   brown   brown   _   _   JJ  _   _   -1  _   _   _   _   _
3   fox fox _   _   NN  _   _   -1  _   _   _   _   _
4   jumps   jump    _   _   VBZ _   _   -1  _   _   _   _   _
5   over    over    _   _   IN  _   _   -1  _   _   _   _   _
6   the the _   _   DT  _   _   -1  _   _   _   _   _
7   lazy    lazy    _   _   JJ  _   _   -1  _   _   _   _   _
8   dog dog _   _   NN  _   _   -1  _   _   _   _   _
9   .   .   _   _   .   _   _   -1  _   _   _   _   _

1   Quick   _   quick   _   JJ  _   _   3   3   NMOD    NMOD    _   _
2   brown   _   brown   _   JJ  _   _   3   3   NMOD    NMOD    _   _
3   fox _   fox _   NN  _   _   4   4   SBJ SBJ _   _
4   jumps   _   jump    _   VBZ _   _   0   0   ROOT    ROOT    _   _
5   over    _   over    _   IN  _   _   4   4   ADV ADV _   _
6   the _   the _   DT  _   _   8   8   NMOD    NMOD    _   _
7   lazy    _   lazy    _   JJ  _   _   8   8   NMOD    NMOD    _   _
8   dog _   dog _   NN  _   _   5   5   PMOD    PMOD    _   _
9   .   _   .   _   .   _   _   4   4   P   P   _   _

Note that the value for PLEMMA column produced by the lemmatizer became LEMMA 
value after the tagging. I believe this is not supposed to happen. 
Morphological tagger and dependency parser also swap the predicted and 
gold-standard lemma, so if one skips the morphological tagging step, the two 
swaps cancel out and the end result is fine, otherwise the role labeler reads 
the lemma value from the third column and we end up with "_" in place of the 
lemma.

Original issue reported on code.google.com by ambl...@gmail.com on 18 Jul 2011 at 3:14

GoogleCodeExporter commented 8 years ago
The new version will solve this issue. Thanks for the report. 

Original comment by bern...@googlemail.com on 15 Sep 2011 at 3:17

GoogleCodeExporter commented 8 years ago
In the new version that is solved (anna-2 and anna-3.x). 

Original comment by boh...@informatik.uni-stuttgart.de on 6 Nov 2012 at 12:50

GoogleCodeExporter commented 8 years ago

Original comment by boh...@informatik.uni-stuttgart.de on 13 Nov 2013 at 5:26