codeaudit / dkpro-core-asl

Automatically exported from code.google.com/p/dkpro-core-asl
0 stars 0 forks source link

Unknown Tag messages when reading Brown corpus #154

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I am using the "teaching" module of DKPro Core for working with the Brown 
corpus. The corpus is the XML-encoded version 1.0 of the Brown corpus as used 
in the ENLP course.

Reading the tagged tokens via Corpus.getTaggedTokens() I get a long list of 
"Unknown tag" messages such as the following. However, most of the tags (apart 
from the ones mentioned in the messages) seem to be read correctly.

Unkwown tag NPG
Unkwown tag NPG
Unkwown tag NEG
Unkwown tag NEG
Unkwown tag NEG
Unkwown tag NNG
[...]

In my configuration I use the following dependency:
<dependency>
    <groupId>de.tudarmstadt.ukp.dkpro.teaching</groupId>
    <artifactId>de.tudarmstadt.ukp.dkpro.teaching.corpus</artifactId>
    <version>0.4.0</version>
</dependency>

Here is a minimal working example:

package de.tudarmstadt.ukp.teaching.enlp.tutorial.tut5.snippets;

import de.tudarmstadt.ukp.dkpro.teaching.corpus.BrownCorpus;
import de.tudarmstadt.ukp.dkpro.teaching.corpus.Corpus;

public class BrownCorpusMinimal
{
    public static void main(String[] args)
        throws Exception
    {
        /*
         * The Brown Corpus needs to reside in [DKPRO_HOME]/dkpro_teaching/corpora/brown_tei
         */
        Corpus corpus = new BrownCorpus("brown_tei");
        corpus.getTaggedTokens();
    }
}

Regards,
Roland

Original issue reported on code.google.com by roland.k...@googlemail.com on 7 Jun 2013 at 7:52

GoogleCodeExporter commented 9 years ago

Original comment by richard.eckart on 7 Jun 2013 at 7:59

GoogleCodeExporter commented 9 years ago
Yes, this is a known problem.
Some of the tags in the brown corpus are unknown, i.e. they are not listed in 
the documentation, e.g. here:
http://www.comp.leeds.ac.uk/ccalas/tagsets/brown.html

They are mapped to "O" (i.e. "OTHER"), and the warning is output.

Should you find documentation on these tags somewhere, feel free to reopen the 
bug and augment the mapping.

Original comment by torsten....@gmail.com on 9 Jun 2013 at 10:25