dkpro / dkpro-core

Collection of software components for natural language processing (NLP) based on the Apache UIMA framework.
https://dkpro.github.io/dkpro-core
Other
196 stars 67 forks source link

Question on StanfordCoreferenceResolver / grammatical dependencies #582

Closed reckart closed 9 years ago

reckart commented 9 years ago
Hi,

I just updated to dkpro 1.7.0, and I am running the following pipeline:
StanfordSegmenter
StanfordParser (with DependenciesMode.CC_PROPAGATED)
StanfordLemmatizer
StanfordNamedEntityRecognizer
StanfordCoreferenceResolver

I am not getting the following warnings from dcoref Document.java:

Jan 22, 2015 10:50:56 AM edu.stanford.nlp.dcoref.Document findSpeaker
WARNING: Cannot find node in dependency for word said

As far as I understand the code, these dependencies should be created here in StanfordCoreferenceResolver.java:

// We currently do not copy over dependencies from the CAS. This is supposed to fill
// in the dependencies so we do not get NPEs.
TreebankLanguagePack tlp = new PennTreebankLanguagePack();
GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory(
                    tlp.punctuationWordRejectFilter(), tlp.typedDependencyHeadFinder());
ParserAnnotatorUtils.fillInParseAnnotations(false, true, gsf, sentence, treeCopy);

Is there any way you or I could verify whether this happens?

Thanks a lot!

Original issue reported on code.google.com by annemarie.friedrich on 2015-01-22 09:53:20

reckart commented 9 years ago
Can you provide the sentence where this problem occurs?

Original issue reported on code.google.com by richard.eckart on 2015-01-22 10:01:03

reckart commented 9 years ago
It occurs many times in many of my documents (ACE 2005), whenever you have a verb indicating
speech after a quote. Here are two examples:

``Mr. Campbell is sufficiently embarrassed and ashamed for what he
did, as well he should be,'' District Judge Reinette Cooper said
Monday.

"We cannot forgive this war," Miyako Fuji, 20, one of the rally's
organisers told Jiji news agency.

Stanford's dcoref tries to extract a feature indicating that the subject of the verb
is the speaker of the quote, apparently it can't find the grammatical dependencies,
and I am not sure why. I didn't get these warngins in 1.6.2.

Original issue reported on code.google.com by annemarie.friedrich on 2015-01-22 10:30:23

reckart commented 9 years ago
The StanfordParser should by default generate the dependencies in your pipeline.
I think though, that the DCoref needs collapsed dependencies of the type TREE. If I
remember right, that's why I made that setting the default. I'm afraid, I cannot say
more without looking into this in more detail Thanks for the sentence!

Original issue reported on code.google.com by richard.eckart on 2015-01-22 11:26:34

reckart commented 9 years ago
Hi, I am using stanford-corenlp-3.2.1.jar (my POM uses de.tudarmstadt.ukp.dkpro.core.stanfordnlp-gpl-1.7.0).
I found the problem: Stanford CoreNLP seems to make changes at IndexedWord: word()
and value() both exist but according to a comment, should be unified at some time.

Dcoref's Document.java makes use of the function getNodeByWordPattern of SemanticGraph,
which in turn uses w.word(). This does not seem to be set by

ParserAnnotatorUtils.fillInParseAnnotations(false, true, gsf, sentence, treeCopy);

value() is set, however, so I preliminarily fixed the problem by adding the following
right after fillInParseAnnotations in StanfordCoreferenceResolver.

SemanticGraph deps = sentence.get(SemanticGraphCoreAnnotations.CollapsedDependenciesAnnotation.class);
for (IndexedWord vertex : deps.vertexSet()) {
     vertex.setWord(vertex.value());
}

The problem should be fixed in StanfordCoreNLP, however.

Hope this helps for anyone running into the same problem.

Original issue reported on code.google.com by annemarie.friedrich on 2015-01-22 15:46:17

reckart commented 9 years ago
Thanks for looking into this! I'll add these lines as a workaround to the StanfordCoreferenceResolver
until we have a fix from upstream. And thanks for reporting this upstream: https://github.com/stanfordnlp/CoreNLP/issues/49

Original issue reported on code.google.com by richard.eckart on 2015-01-22 17:22:12

reckart commented 9 years ago
(No text was entered with this change)

Original issue reported on code.google.com by richard.eckart on 2015-01-22 17:34:18

reckart commented 9 years ago
(No text was entered with this change)

Original issue reported on code.google.com by richard.eckart on 2015-01-22 17:36:28

reckart commented 9 years ago
(No text was entered with this change)

Original issue reported on code.google.com by richard.eckart on 2015-01-22 17:47:46