Stanford DCoref callback to parser does not work when run in DKPro Core

GoogleCodeExporter commented 9 years ago

In certain conditions, the Stanford DCoref module calls our to the 
StanfordCoreNLP framework in order to invoke a parser to re-parse a mention. 
Since we run DCoref standalone in DKPro Core, the StanfordCoreNLP framework is 
never initialized. This causes an NPE. As a result, the DCoref only works in a 
few scenarios, but creates an NPE in many cases - basically rendering the 
component useless atm.

Original issue reported on code.google.com by richard.eckart on 22 Sep 2013 at 10:40

GoogleCodeExporter commented 9 years ago

Original comment by richard.eckart on 29 Sep 2013 at 2:52

Added labels: Milestone-1.5.1, DKPro-GPL, Module-stanfordnlp

GoogleCodeExporter commented 9 years ago

On 03.10.2013, at 01:27, John Bauer <horatio@gmail.com> wrote:

I agree that there could be a much better way of handling this
situation.  However, it also seems that if you are supplying all the
necessary information, this case should not come up.  Are the labels
in your trees being created with BeginIndexAnnotation and
EndIndexAnnotation?

John

Original comment by richard.eckart on 3 Oct 2013 at 10:06

GoogleCodeExporter commented 9 years ago

Thanks for the feedback. I did some debugging and it turned out that
DCoRef chokes because of the tokenizer I was using - although there
appear to be other ways to force the problem.

To give some context: I was trying to create a pipeline with a component
from a different vendor for every step of the analysis. While this may
not be sensible because of different annotation guidelines, etc. it should
demonstrate that the technical interoperability works. So in my first try,
I was using the tokenizer from LanguageTool, the Berkeley Parser and the
Stanford DCoRef.

So eventually, I got a fragment like this:

(NP (NP (NP (DT the) (NNP Don) (POS ')) (NN t) (NN Ask)) (, ,) (NP (NP (NP (NNP 
Don) (POS ')) (JJ t) (NNP Tell) (NNP Repeal) (NNP Act)) (PP (IN of) (NP (CD 
2010)))))

or more to the point, this one:

(NP (DT the) (NNP Don) (POS ')) -- where "Don '" is actually a part of "Don't" 
which should have been tokenized as "Do" "n't".

There is a special logic in the DCoRef handling cases where the last token is a 
'. If this is the case, the end-index is decremented by one. This leads to the 
situation that a match is searched apparently for the NP, but it cannot be 
found because the NP ends in the '. At that point, the code seemingly decides 
that it would be better to try re-parsing "the Don".

This case was easily fixed by replacing the tokenizer with the Stanford PTB 
Tokenizer. However, I believe it would still be better to be robust about such 
situations even in cases where the CoreNLP framework has not been configured, 
e.g. by simple ignoring that part instead of failing with an 
NullPointerException.

This sentence triggers the re-parsing for me, even when using the Stanford PTB 
tokenizer and parser (Don is a river in Russia):

 'Let's go! I want to see the Don', he said.  --  (NX (NNP Don) (POS '))

while this one appears to be fine

 'Let's go! I want to see the river', he said. -- (NP (DT the) (NN river) ('' '))

(The two parse fragments come in this case from the online demo of the parser)

-- Richard

Original comment by richard.eckart on 3 Oct 2013 at 10:06

GoogleCodeExporter commented 9 years ago

Original comment by richard.eckart on 4 Oct 2013 at 1:28

Changed state: Started

GoogleCodeExporter commented 9 years ago

Original comment by richard.eckart on 19 Dec 2013 at 1:55

Changed state: Fixed

GoogleCodeExporter commented 9 years ago

Original comment by richard.eckart on 26 Mar 2014 at 10:51

Added labels: Milestone-1.6.0
Removed labels: Milestone-1.5.1

GoogleCodeExporter commented 9 years ago

Hi Richard,

I still get this error, when using the StanfordSegmenter, StanfordParser and 
StanfordNamedEntityRecognizer (all with default arguments) previously in the 
pipeline. Am I missing anything?

Could you possible post a configuration of a full pipeline that makes dcoref 
work?

Thanks a lot!

Best,
Anne

Original comment by annemari...@googlemail.com on 19 Sep 2014 at 1:28

GoogleCodeExporter commented 9 years ago

What version of DKPro Core are you using? (Mind that all components you use 
should have the same version).

Original comment by richard.eckart on 19 Sep 2014 at 2:38

GoogleCodeExporter commented 9 years ago

I am using 1.5.0.
I just noticed that the error does not happen on all documents, for some it 
works fine.

This is the error I get. Only parsing seems successful, but apparently dcoref 
can't find some syntactic information here. Any ideas how to further debug 
this? Thanks!

ERROR: attempted to fetch annotator "parse" before the annotator pool was 
created!
Oct 22, 2014 11:37:55 AM 
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl 
callAnalysisComponentProcess(410)
SEVERE: Exception occurred
...
Caused by: java.lang.NullPointerException
    at edu.stanford.nlp.dcoref.RuleBasedCorefMentionFinder.parse(RuleBasedCorefMentionFinder.java:338)
    at edu.stanford.nlp.dcoref.RuleBasedCorefMentionFinder.findSyntacticHead(RuleBasedCorefMentionFinder.java:273)
    at edu.stanford.nlp.dcoref.RuleBasedCorefMentionFinder.findHead(RuleBasedCorefMentionFinder.java:215)
    at edu.stanford.nlp.dcoref.RuleBasedCorefMentionFinder.extractPredictedMentions(RuleBasedCorefMentionFinder.java:88)
    at de.tudarmstadt.ukp.dkpro.core.stanfordnlp.StanfordCoreferenceResolver.process(StanfordCoreferenceResolver.java:257)
    at org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48)
    at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:378)

Original comment by annemari...@googlemail.com on 22 Oct 2014 at 9:52

GoogleCodeExporter commented 9 years ago

This issue has been fixed in version 1.6.0 of DKPro Core. I suggest you upgrade 
to the latest version which is 1.6.2.

Original comment by richard.eckart on 22 Oct 2014 at 9:55

GoogleCodeExporter commented 9 years ago

Okay, I will switch to the maven versions -- was using the provided jars so 
far. Thanks.

Original comment by annemari...@googlemail.com on 22 Oct 2014 at 2:46

google-code-export / dkpro-core-asl

Stanford DCoref callback to parser does not work when run in DKPro Core #247