Closed joinr closed 5 years ago
Note: it's sufficient, when using a newer version of spark (like 2.3.1), to replace the spark dependencies from the version in the README with the current version you're running:
:dependencies
[[hcadatalab/powderkeg "0.5.1"]
[com.esotericsoftware/kryo-shaded "4.0.0"]
;; For Spark 2.x (2.3.1) support
[org.apache.spark/spark-core_2.11 "2.3.1"]
[org.apache.spark/spark-streaming_2.11 "2.3.1"]]
Disclaimer: I'm new to spark, and setup my environment on an amazon ec2 instance (running as a stand-alone cluster on localhost) following this guide . I initially installed the most recent spark (from official site, using current version), then immediately had problems running on the cluster. Oddly enough, spark-submit worked fine, but going via the repl, reproducing the examples in the readme, keg/connect! kept bombing with timeouts, failing to connect to master, with this error during spark context creation:
"18/07/30 23:26:57 ERROR SparkContext: Error initializing SparkContext. java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers on a running MetricsSystem"
After messing around with the firewall options, using precise ip's versus hostname, etc. I backtracked and got the 2.1 spark installed. Everything worked fine.
[edit] I realize ex-post facto that Spark 2.1 is limited as supported in the top of the readme; dunno how I missed that :) For newbs starting out, building a stand-alone local cluster to run the examples, you have to get spark 2.1 exactly from the spark downloads (there's a selector for older releases). Note: 2.1 is about 2 years old at this point.
Are there any plans to work with 2.3? I think it's just the dependencies that need to be updated...