HCADatalab / powderkeg

Live-coding the cluster!
Eclipse Public License 1.0
159 stars 23 forks source link

Cryptic failure with spark > 2.1 (don't make my mistake) #44

Closed joinr closed 5 years ago

joinr commented 5 years ago

Disclaimer: I'm new to spark, and setup my environment on an amazon ec2 instance (running as a stand-alone cluster on localhost) following this guide . I initially installed the most recent spark (from official site, using current version), then immediately had problems running on the cluster. Oddly enough, spark-submit worked fine, but going via the repl, reproducing the examples in the readme, keg/connect! kept bombing with timeouts, failing to connect to master, with this error during spark context creation:

"18/07/30 23:26:57 ERROR SparkContext: Error initializing SparkContext. java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers on a running MetricsSystem"

After messing around with the firewall options, using precise ip's versus hostname, etc. I backtracked and got the 2.1 spark installed. Everything worked fine.

[edit] I realize ex-post facto that Spark 2.1 is limited as supported in the top of the readme; dunno how I missed that :) For newbs starting out, building a stand-alone local cluster to run the examples, you have to get spark 2.1 exactly from the spark downloads (there's a selector for older releases). Note: 2.1 is about 2 years old at this point.

Are there any plans to work with 2.3? I think it's just the dependencies that need to be updated...

joinr commented 5 years ago

Note: it's sufficient, when using a newer version of spark (like 2.3.1), to replace the spark dependencies from the version in the README with the current version you're running:

:dependencies
    [[hcadatalab/powderkeg "0.5.1"] 
     [com.esotericsoftware/kryo-shaded "4.0.0"] 
     ;; For Spark 2.x (2.3.1) support 
     [org.apache.spark/spark-core_2.11 "2.3.1"] 
     [org.apache.spark/spark-streaming_2.11 "2.3.1"]]