dmcc / PyStanfordDependencies

Python interface for converting Penn Treebank trees to Stanford Dependencies and Universal Depenencies
https://pypi.python.org/pypi/PyStanfordDependencies
68 stars 17 forks source link

Support CoreNLP 3.6.0 #20

Open dmcc opened 8 years ago

dmcc commented 8 years ago

CoreNLP version 3.6.0 has (at least) two changes which break PyStanfordDependencies:

Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
    at edu.stanford.nlp.io.IOUtils.<clinit>(IOUtils.java:42)
    at edu.stanford.nlp.trees.MemoryTreebank.processFile(MemoryTreebank.java:302)
    at edu.stanford.nlp.util.FilePathProcessor.processPath(FilePathProcessor.java:84)
    at edu.stanford.nlp.trees.MemoryTreebank.loadPath(MemoryTreebank.java:152)
    at edu.stanford.nlp.trees.Treebank.loadPath(Treebank.java:180)
    at edu.stanford.nlp.trees.Treebank.loadPath(Treebank.java:151)
    at edu.stanford.nlp.trees.Treebank.loadPath(Treebank.java:137)
    at edu.stanford.nlp.trees.GrammaticalStructure.main(GrammaticalStructure.java:1702)
Caused by: java.lang.ClassNotFoundException: org.slf4j.LoggerFactory
    at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 8 more
}

(comes from a command line like this: java -ea -cp /path/to/stanford-corenlp-3.6.0.jar edu.stanford.nlp.trees.EnglishGrammaticalStructure -basic -treeFile treefile -keepPunct -originalDependencies)

@gangeli, is slf4j required to run CoreNLP 3.6.0?

gangeli commented 8 years ago

This is from the move to slf4j for logging. In the next release, this will be worked around such that CoreNLP uses Redwood unless slf4j is in the classpath -- should be transparent for a Java user, but not crash if you're missing libraries.

The other option is to use the new CoreNLP Server to get output as protocol buffers, and not have to care at all about what the Java process does on its own time.

dmcc commented 8 years ago

Thanks for the information, @gangeli! If the next slf4j-less release is not too far out, I'm inclined to skip making 3.6.0 work out of the box since it would involve a lot of CoreNLP-version-specific code to download and install slf4j (and/or Maven integration which is a dependency I'd rather not add since PyStanfordDependencies's goal is to have everything more or less handled in Python).

The server idea is interesting -- it would be nice to add it as a possible PyStanfordDependencies backend (though users would still need to obtain all the necessary jar files).

melodyju commented 8 years ago

Is there a (fairly simple) workaround for this issue?

gangeli commented 8 years ago

Oh, I've already cut slf4j out. You should be able to run the GitHub version of the code without slf4j in your classpath, and it shouldn't crash.

On Wed, Apr 13, 2016 at 12:32 PM, melodyju notifications@github.com wrote:

Is there a (fairly simple) workaround for this issue?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/dmcc/PyStanfordDependencies/issues/20#issuecomment-209613557

dmcc commented 8 years ago

Thanks @gangeli. In that case, probably the easiest workaround (assuming you don't need exactly version 3.6.0 and are okay with an unreleased version) is to check out the latest CoreNLP version from GitHub, build it (looks like it will build with ant or gradle), and use that jar as your jar_filename.

The next easiest option is to hack this line to include the path to a downloaded jar of slf4j. This should allow you to use version 3.6.0. We could add an option to make it easier to pass extra jars and/or command line flags for SubprocessBackend, but I'm hoping this type of case doesn't show up that much.

gangeli commented 8 years ago

Oops; I completely ignored the context of the ticket... Yes, go with @dmcc 's workaround, until we release 3.7.0 (which I'm pushing to make as soon as possible, but may take until the summer).

ghost commented 7 years ago

Could it be that this bug is still present?

dmcc commented 7 years ago

Quite possibly, unfortunately. Are you using 3.6.0 or 3.7.0? (I don't think we ever really resolved it for 3.6.0, not sure if it's still a problem in 3.7.0)

ghost commented 7 years ago

Well, I switched to 3.5.2 for now and it appears, that 3.7.0 is currently not available in the repository: http://repo1.maven.org/maven2/edu/stanford/nlp/stanford-corenlp/

gangeli commented 7 years ago

Heh; clearly I was too optimistic about 3.7.0's release date. But, this should I hope be resolved in 3.7.0. It should be on maven soon.