Differences in using subprocess and jpype backends

dmcc / PyStanfordDependencies

Python interface for converting Penn Treebank trees to Stanford Dependencies and Universal Depenencies

https://pypi.python.org/pypi/PyStanfordDependencies

68 stars 17 forks source link

Differences in using subprocess and jpype backends #13

Closed leonli02 closed 9 years ago

leonli02 commented 9 years ago

Hi,

I got different results when using two different backends with same stanford corenlp jar. It seems like the result from subprocess is identical to the one from Stanford online demo. I've also gone through the python code but still couldn't figure it out.

I'd be appreciated if you can offer me any advice.

dmcc commented 9 years ago

Thanks for the report! I think this has to do with some different behavior between SD and UD (see issue #10). I'll take a look.

In the meantime, I would recommend using the subprocess backend for UD, if possible. The JPype backend calls Java code based on EnglishGrammaticalStructure, but should probably be following UniversalEnglishGrammaticalStructure instead (just a guess).

leonli02 commented 9 years ago

Hi dmcc,

Thanks for your reply. JPype backend has better performance on big data sets, so I spent some time on your project and found the solution. I changed 'trees.EnglishGrammaticalStructure' to 'trees.UniversalEnglishGrammaticalStructure' in JPypeBackend.py line 54, and obtained same result as subprocess backend.

I also found that version 3.5.2 changed universal dependencies as default setting, maybe this is the reason why I got two different results.

dmcc commented 9 years ago

Thanks for your help! There's now support for UD with a new patch (https://github.com/dmcc/PyStanfordDependencies/commit/54c84beb256a93bcf4d9f69b5c070f0f68e2a067). It defaults to using UD but can go back to SD if you pass universal=False. If you're interested in testing it, please let me know if you run into any problems.