brendano / stanford_corenlp_pywrapper

151 stars 59 forks source link

update file to emit root relation #14

Closed sandeepsoni closed 9 years ago

sandeepsoni commented 9 years ago

Fixes : https://github.com/brendano/stanford_corenlp_pywrapper/issues/13

brendano commented 9 years ago

Did you test this? Could you please show examples of JSON output before and after the change? I read the code, but I don't see how the root relation is added to the data structure.

sandeepsoni commented 9 years ago

Yes.

The output before is as follows:

In [3]: p.parse_doc("I learn natural language processing.") Out[3]: {u'sentences': [{u'char_offsets': [[0, 1], [2, 7], [8, 15], [16, 24], [25, 35], [35, 36]], u'deps_basic': [[u'nsubj', 1, 0], [u'dobj', 1, 4], [u'amod', 4, 2], [u'nn', 4, 3]], u'deps_cc': [[u'nsubj', 1, 0], [u'dobj', 1, 4], [u'amod', 4, 2], [u'nn', 4, 3]], u'lemmas': [u'I', u'learn', u'natural', u'language', u'processing', u'.'], u'ner': [u'O', u'O', u'O', u'O', u'O', u'O'], u'normner': [u'', u'', u'', u'', u'', u''], u'parse': u'(ROOT (S (NP (PRP I)) (VP (VBP learn) (NP (JJ natural) (NN language) (NN processing))) (. .)))', u'pos': [u'PRP', u'VBP', u'JJ', u'NN', u'NN', u'.'], u'tokens': [u'I', u'learn', u'natural', u'language', u'processing', u'.']}]}

Whereas, after applying the changes I proposed the output is as follows:

In [3]: p.parse_doc("I learn natural language processing.") Out[3]: {u'sentences': [{u'char_offsets': [[0, 1], [2, 7], [8, 15], [16, 24], [25, 35], [35, 36]], u'deps_basic': [[u'ROOT', -1, 1], [u'nsubj', 1, 0], [u'dobj', 1, 4], [u'amod', 4, 2], [u'nn', 4, 3]], u'deps_cc': [[u'ROOT', -1, 1], [u'nsubj', 1, 0], [u'dobj', 1, 4], [u'amod', 4, 2], [u'nn', 4, 3]], u'lemmas': [u'I', u'learn', u'natural', u'language', u'processing', u'.'], u'ner': [u'O', u'O', u'O', u'O', u'O', u'O'], u'normner': [u'', u'', u'', u'', u'', u''], u'parse': u'(ROOT (S (NP (PRP I)) (VP (VBP learn) (NP (JJ natural) (NN language) (NN processing))) (. .)))', u'pos': [u'PRP', u'VBP', u'JJ', u'NN', u'NN', u'.'], u'tokens': [u'I', u'learn', u'natural', u'language', u'processing', u'.']}]}

The difference is that I have an extra [u'ROOT', -1, 1] in the output for both deps_basic and deps_cc

There is only a small change in the code all in the function jsonFriendlyDeps . In that , I get the root word's index, create a dependency triple like above and add it to the list of triples.

brendano commented 9 years ago

sorry it took so long to get to this!

brendano commented 9 years ago

Wait -- are you sure this is the same code you tested with?

In the java patch here, you create a variable called "deptriple". But you never add it to the "deps" list.

What's returned is the "deps" list.

Therefore this code doesn't work. I'm testing it now and it seems to not work. There's no root relation coming out.

brendano commented 9 years ago

ok fixed never mind

sandeepsoni commented 9 years ago

Oh. My bad. I had a local commit which I did not push. Rookie mistake. Sorry. I am seeing in the local commit that I had also fixed the inconsistent capital case for the root relation. All the relation names emitted by the stanford parser are always in lower case. In this version,the root relation is capital case. I'd like to fix it.