databricks / spark-corenlp

Stanford CoreNLP wrapper for Apache Spark
GNU General Public License v3.0
422 stars 120 forks source link

publishing this to spark-packages repo #5

Closed cfregly closed 8 years ago

cfregly commented 8 years ago

@mengxr would you mind publishing to a repo now that CoreNLP 3.6.0 is available on Maven Central?

http://search.maven.org/#artifactdetails%7Cedu.stanford.nlp%7Cstanford-corenlp%7C3.6.0%7Cjar

thanks!

mengxr commented 8 years ago

Done.

cfregly commented 8 years ago

thanks, @mengxr!

quick question that's slightly related. it appears that --packages does not support specifying a classifier as follows:

groupId:artifactId:version:classifier

This classifier is needed to pull in the stanford-corenlp-3.6.0-models.jar, of course.

I found this in the Spark code base that references just groupId:artifactId:version for --packages: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L495

Curious if you worked around this in a more clever manner than downloading the jar and referencing it with --jars. Is this a potential Spark Jira? I've searched quite a bit, but can't seem to find anything related.

cfregly commented 8 years ago

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L842

cfregly commented 8 years ago

@mengxr

i still don't see the library:

http://dl.bintray.com/spark-packages/maven/databricks/

spark-avro/
spark-csv/
spark-redshift/

and spark-packages.org says it hasn't been released. (not sure how that actually gets updated)

thiakx commented 8 years ago

@mengxr is there a timeline to publish the repo? Since my team uses spark + corenlp, this will be quite an interesting package to work with.