databricks / spark-corenlp

Stanford CoreNLP wrapper for Apache Spark
GNU General Public License v3.0
422 stars 120 forks source link

protobuf dependency conflict between corenlp and spark #4

Closed cfregly closed 8 years ago

cfregly commented 8 years ago

@mengxr it looks like spark is stuck on protobuf-java version 2.5.0 (https://github.com/apache/spark/blob/0a38637d05d2338503ecceacfb911a6da6d49538/pom.xml#L130) while corenlp has charged ahead with v 2.6.1.

how did you overcome this conflict?

here's the stack trace:

java.lang.NoSuchMethodError: com.google.protobuf.LazyStringList.getUnmodifiableView()Lcom/google/protobuf/LazyStringList;
    at edu.stanford.nlp.pipeline.CoreNLPProtos$Token$Builder.buildPartial(CoreNLPProtos.java:12243)
    at edu.stanford.nlp.pipeline.CoreNLPProtos$Token$Builder.build(CoreNLPProtos.java:12145)
    at edu.stanford.nlp.pipeline.ProtobufAnnotationSerializer.toProto(ProtobufAnnotationSerializer.java:238)
    at edu.stanford.nlp.pipeline.ProtobufAnnotationSerializer.toProtoBuilder(ProtobufAnnotationSerializer.java:384)
    at edu.stanford.nlp.pipeline.ProtobufAnnotationSerializer.toProto(ProtobufAnnotationSerializer.java:345)
    at edu.stanford.nlp.pipeline.ProtobufAnnotationSerializer.toProtoBuilder(ProtobufAnnotationSerializer.java:494)
    at edu.stanford.nlp.pipeline.ProtobufAnnotationSerializer.toProto(ProtobufAnnotationSerializer.java:456)
    at com.databricks.spark.corenlp.CoreNLP$$anonfun$1.apply(CoreNLP.scala:77)
    at com.databricks.spark.corenlp.CoreNLP$$anonfun$1.apply(CoreNLP.scala:73)  

btw, it looks like corenlp 3.6.0 is avaliable, but will be released to maven central sometime in january.

cfregly commented 8 years ago

I waited for Spark 1.6.0 to give this another try.

I noticed this comment in the pom.xml at the root of the spark project:

   <!-- In theory we need not directly depend on protobuf since Spark does not directly
           use it. However, when building with Hadoop/YARN 2.2 Maven doesn't correctly bump
           the protobuf version up from the one Mesos gives. For now we include this variable
           to explicitly bump the version when building with YARN. It would be nice to figure
           out why Maven can't resolve this correctly (like SBT does). -->

     <dependency>
        <groupId>com.google.protobuf</groupId>
        <artifactId>protobuf-java</artifactId>
        <version>${protobuf.version}</version>
        <scope>${hadoop.deps.scope}</scope>
     </dependency>

So I just changed the <protobuf.version> in the main pom.xml to 2.6.1, built from source, and rolled the dice.

Oh, and one more thing that is likely related... I removed -Pkinesis-asl from my build command which seemed to depend on protobuf.

Here's the final build command:

export MAVEN_OPTS="-Xmx8g -XX:ReservedCodeCacheSize=512m" && ./make-distribution.sh --name fluxcapacitor --tgz --with-tachyon --skip-java-test -Phadoop-2.6 -Dhadoop.version=2.6.0 -Psparkr -Phive -Phive-thriftserver -Pspark-ganglia-lgpl -Pnetlib-lgpl -DskipTests

Seems to be working for now. Fingers crossed.