google-code-export / dkpro-core-asl

Automatically exported from code.google.com/p/dkpro-core-asl
0 stars 0 forks source link

Question on StanfordCoreferenceResolver / grammatical dependencies #582

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Hi,

I just updated to dkpro 1.7.0, and I am running the following pipeline:
StanfordSegmenter
StanfordParser (with DependenciesMode.CC_PROPAGATED)
StanfordLemmatizer
StanfordNamedEntityRecognizer
StanfordCoreferenceResolver

I am not getting the following warnings from dcoref Document.java:

Jan 22, 2015 10:50:56 AM edu.stanford.nlp.dcoref.Document findSpeaker
WARNING: Cannot find node in dependency for word said

As far as I understand the code, these dependencies should be created here in 
StanfordCoreferenceResolver.java:

// We currently do not copy over dependencies from the CAS. This is supposed to 
fill
// in the dependencies so we do not get NPEs.
TreebankLanguagePack tlp = new PennTreebankLanguagePack();
GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory(
                    tlp.punctuationWordRejectFilter(), tlp.typedDependencyHeadFinder());
ParserAnnotatorUtils.fillInParseAnnotations(false, true, gsf, sentence, 
treeCopy);

Is there any way you or I could verify whether this happens?

Thanks a lot!

Original issue reported on code.google.com by annemari...@googlemail.com on 22 Jan 2015 at 9:53

GoogleCodeExporter commented 9 years ago
Can you provide the sentence where this problem occurs?

Original comment by richard.eckart on 22 Jan 2015 at 10:01

GoogleCodeExporter commented 9 years ago
It occurs many times in many of my documents (ACE 2005), whenever you have a 
verb indicating speech after a quote. Here are two examples:

``Mr. Campbell is sufficiently embarrassed and ashamed for what he
did, as well he should be,'' District Judge Reinette Cooper said
Monday.

"We cannot forgive this war," Miyako Fuji, 20, one of the rally's
organisers told Jiji news agency.

Stanford's dcoref tries to extract a feature indicating that the subject of the 
verb is the speaker of the quote, apparently it can't find the grammatical 
dependencies, and I am not sure why. I didn't get these warngins in 1.6.2.

Original comment by annemari...@googlemail.com on 22 Jan 2015 at 10:30

GoogleCodeExporter commented 9 years ago
The StanfordParser should by default generate the dependencies in your pipeline.
I think though, that the DCoref needs collapsed dependencies of the type TREE. 
If I remember right, that's why I made that setting the default. I'm afraid, I 
cannot say more without looking into this in more detail Thanks for the 
sentence!

Original comment by richard.eckart on 22 Jan 2015 at 11:26

GoogleCodeExporter commented 9 years ago
Hi, I am using stanford-corenlp-3.2.1.jar (my POM uses 
de.tudarmstadt.ukp.dkpro.core.stanfordnlp-gpl-1.7.0). I found the problem: 
Stanford CoreNLP seems to make changes at IndexedWord: word() and value() both 
exist but according to a comment, should be unified at some time.

Dcoref's Document.java makes use of the function getNodeByWordPattern of 
SemanticGraph, which in turn uses w.word(). This does not seem to be set by

ParserAnnotatorUtils.fillInParseAnnotations(false, true, gsf, sentence, 
treeCopy);

value() is set, however, so I preliminarily fixed the problem by adding the 
following right after fillInParseAnnotations in StanfordCoreferenceResolver.

SemanticGraph deps = 
sentence.get(SemanticGraphCoreAnnotations.CollapsedDependenciesAnnotation.class)
;
for (IndexedWord vertex : deps.vertexSet()) {
     vertex.setWord(vertex.value());
}

The problem should be fixed in StanfordCoreNLP, however.

Hope this helps for anyone running into the same problem.

Original comment by annemari...@googlemail.com on 22 Jan 2015 at 3:46

GoogleCodeExporter commented 9 years ago
Thanks for looking into this! I'll add these lines as a workaround to the 
StanfordCoreferenceResolver until we have a fix from upstream. And thanks for 
reporting this upstream: https://github.com/stanfordnlp/CoreNLP/issues/49

Original comment by richard.eckart on 22 Jan 2015 at 5:22

GoogleCodeExporter commented 9 years ago

Original comment by richard.eckart on 22 Jan 2015 at 5:34

GoogleCodeExporter commented 9 years ago

Original comment by richard.eckart on 22 Jan 2015 at 5:36

GoogleCodeExporter commented 9 years ago

Original comment by richard.eckart on 22 Jan 2015 at 5:47