Closed GoogleCodeExporter closed 9 years ago
Original comment by richard.eckart
on 29 Sep 2013 at 2:52
On 03.10.2013, at 01:27, John Bauer <horatio@gmail.com> wrote:
I agree that there could be a much better way of handling this
situation. However, it also seems that if you are supplying all the
necessary information, this case should not come up. Are the labels
in your trees being created with BeginIndexAnnotation and
EndIndexAnnotation?
John
Original comment by richard.eckart
on 3 Oct 2013 at 10:06
Thanks for the feedback. I did some debugging and it turned out that
DCoRef chokes because of the tokenizer I was using - although there
appear to be other ways to force the problem.
To give some context: I was trying to create a pipeline with a component
from a different vendor for every step of the analysis. While this may
not be sensible because of different annotation guidelines, etc. it should
demonstrate that the technical interoperability works. So in my first try,
I was using the tokenizer from LanguageTool, the Berkeley Parser and the
Stanford DCoRef.
So eventually, I got a fragment like this:
(NP (NP (NP (DT the) (NNP Don) (POS ')) (NN t) (NN Ask)) (, ,) (NP (NP (NP (NNP
Don) (POS ')) (JJ t) (NNP Tell) (NNP Repeal) (NNP Act)) (PP (IN of) (NP (CD
2010)))))
or more to the point, this one:
(NP (DT the) (NNP Don) (POS ')) -- where "Don '" is actually a part of "Don't"
which should have been tokenized as "Do" "n't".
There is a special logic in the DCoRef handling cases where the last token is a
'. If this is the case, the end-index is decremented by one. This leads to the
situation that a match is searched apparently for the NP, but it cannot be
found because the NP ends in the '. At that point, the code seemingly decides
that it would be better to try re-parsing "the Don".
This case was easily fixed by replacing the tokenizer with the Stanford PTB
Tokenizer. However, I believe it would still be better to be robust about such
situations even in cases where the CoreNLP framework has not been configured,
e.g. by simple ignoring that part instead of failing with an
NullPointerException.
This sentence triggers the re-parsing for me, even when using the Stanford PTB
tokenizer and parser (Don is a river in Russia):
'Let's go! I want to see the Don', he said. -- (NX (NNP Don) (POS '))
while this one appears to be fine
'Let's go! I want to see the river', he said. -- (NP (DT the) (NN river) ('' '))
(The two parse fragments come in this case from the online demo of the parser)
-- Richard
Original comment by richard.eckart
on 3 Oct 2013 at 10:06
Original comment by richard.eckart
on 4 Oct 2013 at 1:28
Original comment by richard.eckart
on 19 Dec 2013 at 1:55
Original comment by richard.eckart
on 26 Mar 2014 at 10:51
Hi Richard,
I still get this error, when using the StanfordSegmenter, StanfordParser and
StanfordNamedEntityRecognizer (all with default arguments) previously in the
pipeline. Am I missing anything?
Could you possible post a configuration of a full pipeline that makes dcoref
work?
Thanks a lot!
Best,
Anne
Original comment by annemari...@googlemail.com
on 19 Sep 2014 at 1:28
What version of DKPro Core are you using? (Mind that all components you use
should have the same version).
Original comment by richard.eckart
on 19 Sep 2014 at 2:38
I am using 1.5.0.
I just noticed that the error does not happen on all documents, for some it
works fine.
This is the error I get. Only parsing seems successful, but apparently dcoref
can't find some syntactic information here. Any ideas how to further debug
this? Thanks!
ERROR: attempted to fetch annotator "parse" before the annotator pool was
created!
Oct 22, 2014 11:37:55 AM
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl
callAnalysisComponentProcess(410)
SEVERE: Exception occurred
...
Caused by: java.lang.NullPointerException
at edu.stanford.nlp.dcoref.RuleBasedCorefMentionFinder.parse(RuleBasedCorefMentionFinder.java:338)
at edu.stanford.nlp.dcoref.RuleBasedCorefMentionFinder.findSyntacticHead(RuleBasedCorefMentionFinder.java:273)
at edu.stanford.nlp.dcoref.RuleBasedCorefMentionFinder.findHead(RuleBasedCorefMentionFinder.java:215)
at edu.stanford.nlp.dcoref.RuleBasedCorefMentionFinder.extractPredictedMentions(RuleBasedCorefMentionFinder.java:88)
at de.tudarmstadt.ukp.dkpro.core.stanfordnlp.StanfordCoreferenceResolver.process(StanfordCoreferenceResolver.java:257)
at org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48)
at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:378)
Original comment by annemari...@googlemail.com
on 22 Oct 2014 at 9:52
This issue has been fixed in version 1.6.0 of DKPro Core. I suggest you upgrade
to the latest version which is 1.6.2.
Original comment by richard.eckart
on 22 Oct 2014 at 9:55
Okay, I will switch to the maven versions -- was using the provided jars so
far. Thanks.
Original comment by annemari...@googlemail.com
on 22 Oct 2014 at 2:46
Original issue reported on code.google.com by
richard.eckart
on 22 Sep 2013 at 10:40