dkpro / dkpro-core

Collection of software components for natural language processing (NLP) based on the Apache UIMA framework.
https://dkpro.github.io/dkpro-core
Other
196 stars 67 forks source link

Add possible to ignore named entities annotations on BinaryCasReader #1149

Closed lucianojs closed 7 years ago

lucianojs commented 7 years ago

I need to evaluate the F-measure of my NER model generated by BinaryCAS format annotations, but it is not possible to remove NER tag annotations for comparisons. Like the readNamedEntity parameter of Conll2002Reader.

I tried converting BinaryCAS to ConNLL2002 but realized after conversion that this format does not support multiple annotations in the same token.

reckart commented 7 years ago

The loading processes of binary CASes does not allow for controlling what is loaded.

But you can easily remove the annotations yourself. Just create a new component:

import static org.apache.uima.fit.util.JCasUtil.select;

import org.apache.uima.analysis_engine.AnalysisEngineProcessException;
import org.apache.uima.fit.component.JCasAnnotator_ImplBase;
import org.apache.uima.jcas.JCas;

import de.tudarmstadt.ukp.dkpro.core.api.ner.type.NamedEntity;

public class NamedEntityRemover
    extends JCasAnnotator_ImplBase
{
    @Override
    public void process(JCas aJCas) throws AnalysisEngineProcessException
    {
        select(aJCas, NamedEntity.class).forEach(aJCas::removeFsFromIndexes);
    }
}

The above code should work with UIMA 2.10.1, uimaFIT 2.3.0 and Java 8.

Then just add the new component to your pipeline.

reckart commented 7 years ago

If you use the latest version of DKPro Core from this github repo, then you could also find our basic evaluation code useful. E.g. check out OpenNlpNamedEntityRecognizerTrainerTest.java:

https://github.com/dkpro/dkpro-core/blob/7ecebc718c4f8238401ee6b54229fa842618a0b3/dkpro-core-opennlp-asl/src/test/java/de/tudarmstadt/ukp/dkpro/core/opennlp/OpenNlpNamedEntityRecognizerTrainerTest.java#L56-L116

lucianojs commented 7 years ago

It worked perfectly, thank you.