aminorex / dkpro-core-asl

Automatically exported from code.google.com/p/dkpro-core-asl
0 stars 0 forks source link

Conll2006Writer - Bug in Coarsed Grained POS #568

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
According to http://ilk.uvt.nl/conll/ contains the 4. column the coarse-grained 
part-of-speech tag while the 5. column contains fine-grained part-of-speech tag.

However, in the current implementation of the Conll2006Writer.java, both 
columns are identical for all rows. 

The bug is in line 167:
if (!(posAnno instanceof POS)) {
   cpos = posAnno.getClass().getSimpleName();
} else {
  cpos = pos;
}

To fix the bug, the ! must be removed.

I will commit the bugfix in a few seconds.

Original issue reported on code.google.com by nils...@googlemail.com on 18 Dec 2014 at 3:38

GoogleCodeExporter commented 9 years ago
Actually, it must be checked if the posAnno is a sub-class of POS. If it is 
POS, then we'd prefer to use the fine-grained tag. We don't want the 
coarse-grained column to list "POS" as the coarse grained type.

Original comment by richard.eckart on 18 Dec 2014 at 3:40

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r3241.

- Bugfix, coarsed grained POS name was ignored

Original comment by nils...@googlemail.com on 18 Dec 2014 at 3:42

GoogleCodeExporter commented 9 years ago
Another comment: I'm pretty sure this affects all other Conll* writers as well. 
They are largely copy-pasted.

Original comment by richard.eckart on 18 Dec 2014 at 3:43

GoogleCodeExporter commented 9 years ago
Okay, but the problem with the current implementation is that also pos.NN, 
pos.V etc. are also instances of POS, therefore the else part will always be 
executed.

I will resubmit an updated version which checks that the POS tag is not exactly 
the POS class. 

Original comment by nils...@googlemail.com on 18 Dec 2014 at 3:44

GoogleCodeExporter commented 9 years ago
Sure, you're absolutely correct that there is a bug here :)

Original comment by richard.eckart on 18 Dec 2014 at 3:45

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r3242.

- Bugfix: Exclude parent class 'POS' from output

Original comment by nils...@googlemail.com on 18 Dec 2014 at 3:49

GoogleCodeExporter commented 9 years ago
I submitted another bugfix (r3242). This prevents that 'POS' is written if the 
POS annotation is NOT a sub-class of POS. In that case, the fine-grained POS 
tag is written to the output.

I also checked the other Conll* writers, but none of them use coarsed-grained 
POS (as far as I could see).

Original comment by nils...@googlemail.com on 18 Dec 2014 at 3:53

GoogleCodeExporter commented 9 years ago
Let's keep the issue open until this is fixed for the other CONLL format 
writers as well. I also updated the title.

I've set an empty issue owner for now until somebody steps up to fix them.

Original comment by richard.eckart on 18 Dec 2014 at 3:56

GoogleCodeExporter commented 9 years ago
Only applies to Conll 2006 format.

Original comment by richard.eckart on 7 Jan 2015 at 8:00