Constannnnnt / Distributed-CoreNLP

This infrastructure, built on Stanford CoreNLP, MapReduce and Spark with Java, aims at processing documents annotations at large scale.
https://github.com/Constannnnnt/Distributed-CoreNLP
MIT License
0 stars 0 forks source link

Quote Mentions #17

Open Constannnnnt opened 5 years ago

Constannnnnt commented 5 years ago

I finished quote, but one thing here is that it doesn't work as expected? Maybe I wrote a bug?

With the input In the summer Joe Smith decided to go on vacation. He said, "I'm going to Hawaii." That July, vacationer Joe went to Hawaii., the output was

((0,quote),())
((0,natlog),(The,up) (University,up) (of,up) (Waterloo,up) (is,up) (located,up) (in,up) (Canada,up) (.,up) (Goose,up) (lives,up) (in,up) (this,up) (University,up) (.,up))
((1,quote),())
((1,natlog),(The,up) (University,up) (of,up) (Waterloo,up) (is,up) (located,up) (in,up) (Canada,up) (.,up) (Goose,up) (lives,up) (here,up) (.,up))
((2,quote),())
((2,natlog),(all,up) (cats,down) (have,up) (tails,up))
((3,quote),("I'm going to Hawaii.",He)
((3,natlog),(In,up) (the,up) (summer,up) (Joe,up) (Smith,up) (decided,up) (to,up) (go,up) (on,up) (vacation,up) (.,up) (He,up) (said,up) (,,up) (``,up) (I,up) ('m,up) (going,up) (to,up) (Hawaii,up) (.,up) ('',up) (That,up) (July,up) (,,up) (vacationer,up) (Joe,up) (went,up) (to,up) (Hawaii,up) (.,up))

Note: ((3,quote),("I'm going to Hawaii.",He) is returned, but the optimal one is ((3,quote),("I'm going to Hawaii.",Joe Smith). Any idea on this?

KaisongHuang commented 5 years ago

Can "quote" identify "He" is "Joe Smith" in the context? Otherwise, it seems fine.

Constannnnnt commented 5 years ago

From the coref result, it can. I am guessing the default is to associate the quote with the nearest speaker? However, referring back to the coref seems to be time-consuming.

(0,coref),(The University of Waterloo:this University))
((1,coref),)
((2,coref),)
((3,coref),(Joe Smith:He|I))

Since the pipeline has coref as its property, I guess it may have other results, and the remaining question is how to access it?