amplab / spark-ec2

Scripts used to setup a Spark cluster on EC2
Apache License 2.0
393 stars 299 forks source link

Classpath/dependency resolution of JAR app #98

Closed cantide5ga closed 7 years ago

cantide5ga commented 7 years ago

Is this a common thing to fight with? I'm having to be very specific about certain dependencies in my app on submission/runtime. Been getting around this with some shading plugins in my build tools. Some FQN's as an example:

    relocate 'org.apache.http', 'shaded.org.apache.http'
    relocate 'org.apache.spark', 'shaded.org.apache.spark'
    relocate 'com.codahale.metrics', 'shaded.com.codahale.metrics'
    relocate 'com.fasterxml.jackson', 'shaded.com.fasterxml.jackson'
    relocate 'scala', 'shaded.scala'

Another thing to note is that a local cluster using spark-2.1.0-bin-hadoop2.7 doesn't really run into this.

Both environments are roughly executed in the same way: ./spark/bin/spark-submit --class com.some.App --master local[1] --verbose project/build/libs/all.jar

Still going down this rabbit hole to get things just right - the amount of effort is substantial. Something feels wrong here.

cantide5ga commented 7 years ago

Greatly minimized by using branch-2.0.