dasmith / stanford-corenlp-python

Python wrapper for Stanford CoreNLP tools v3.4.1
GNU General Public License v2.0
612 stars 229 forks source link

Multiple occurrences of a word not handled properly while creating tuples #2

Closed abhaga closed 12 years ago

abhaga commented 13 years ago

If there are multiple occurrences of a word in a sentence, lack of ids makes it impossible to identify the source and target of a dependency correctly.

If you are open to accepting a patch for this, I can submit one. My idea is to keep the ids in the "tuples" and store the dependents of a word in the "words" array.

dasmith commented 13 years ago

Hi Abhaya,

Thanks for bringing this to my attention and submitting the patch for the previous regular expression bug. In addition to tracking word ids, the current code ignores the sentence ids that are used to resolve coreferences between sentences. I agree that putting the ID into the word dictionary is a good idea -- maybe a (word, id) tuple?

I am preoccupied for at least two weeks, but if you end up writing something to do this, I'll incorporate your patch. Thanks for the help.

Dustin

dasmith commented 12 years ago

This is fixed in the current release.

boblannon commented 11 years ago

Not sure this is actually fixed!

I can see the word IDs in the output of the server, but the json that I'm receiving in my py code doesn't have the IDs. Is there a chance that this was broken by a subsequent change?