DARIAH-DE / DARIAH-DKPro-Wrapper

Wrapper for DKPro Core to extract lingustic information from books.
http://dariah-de.github.io/DARIAH-DKPro-Wrapper
Apache License 2.0
16 stars 8 forks source link

German output has no chunker information #30

Open fotisj opened 7 years ago

fotisj commented 7 years ago

At the moment the German pipeline does not support chunking, probably because opennlp for German has no model for chunking. Now, the treetagger supports chunking for German and the DKPRO-Wrapper supports Treetagger, so it should be possible to integrate chunking. Would be great to have this included in the next update :-)

thvitt commented 7 years ago

While the tree tagger chunker already is part of the wrapper, we get this exception

2017-06-06T12:17:37,729 ERROR   org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl       Exception occurred
org.apache.uima.analysis_engine.AnalysisEngineProcessException: Annotator processing failed.
        at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:401) [ddw-0.4.7-SNAPSHOT.jar:?]
        at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:308) [ddw-0.4.7-SNAPSHOT.jar:?]
        at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:570) [ddw-0.4.7-SNAPSHOT.jar:?]
        at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.<init>(ASB_impl.java:412) [ddw-0.4.7-SNAPSHOT.jar:?]
        at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:344) [ddw-0.4.7-SNAPSHOT.jar:?]
        at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:265) [ddw-0.4.7-SNAPSHOT.jar:?]
        at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:269) [ddw-0.4.7-SNAPSHOT.jar:?]
        at org.apache.uima.fit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:150) [ddw-0.4.7-SNAPSHOT.jar:?]
        at de.tudarmstadt.ukp.dariah.pipeline.RunPipeline.main(RunPipeline.java:645) [ddw-0.4.7-SNAPSHOT.jar:?]
Caused by: java.lang.NullPointerException
        at org.annolab.tt4j.TreeTaggerWrapper.removeProblematicTokens(TreeTaggerWrapper.java:707) ~[ddw-0.4.7-SNAPSHOT.jar:?]
        at org.annolab.tt4j.TreeTaggerWrapper.process(TreeTaggerWrapper.java:579) ~[ddw-0.4.7-SNAPSHOT.jar:?]
        at de.tudarmstadt.ukp.dkpro.core.treetagger.TreeTaggerChunker.process(TreeTaggerChunker.java:293) ~[ddw-0.4.7-SNAPSHOT.jar:?]
        at org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48) ~[ddw-0.4.7-SNAPSHOT.jar:?]
        at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:385) ~[ddw-0.4.7-SNAPSHOT.jar:?]
        ... 8 more

for the following config:

useChunker = true
chunker = de.tudarmstadt.ukp.dkpro.core.treetagger.TreeTaggerChunker
chunkerArguments = executablePath,string,/opt/tree-tagger/bin/tree-tagger,\
        modelLocation,string,/opt/tree-tagger/lib/german-chunker.par,\
        modelEncoding,string,utf-8