amplab / spark-ec2

Scripts used to setup a Spark cluster on EC2
Apache License 2.0
393 stars 299 forks source link

support for spark 2.2.0? #110

Open kmu-leeky opened 7 years ago

kmu-leeky commented 7 years ago

It looks like spark 2.2.0 is officially released. Is it going to be supported in spark-ec2 shortly?

shivaram commented 7 years ago

We can support it. Would you like to open a PR ?

kmu-leeky commented 7 years ago

I tried locally, but it does not seem as simple as I first thought - just adding 2.2.0 to "VALID_SPARK_VERSIONS" does not really work. Few things to consider. The base image contains Hadoop 2.4, while the Spark binary files are provided from Hadoop 2.6 (spark-2.2.0-bin-hadoop2.6.tgz). The base image also contains Java 1.7, and I read few documents saying that either the recent Hadoop or Spark needs Java 1.8.

shivaram commented 7 years ago

I see. Those do require more changes including changes to the AMI and Hadoop scripts. Unfortunately I dont have time right now to try out the changes right now.

kmu-leeky commented 7 years ago

that's ok. I tweaked the code locally to run 2.2.0 in my repo. I will create a PR if the modification and images can be generalized.

knesterovich commented 7 years ago

Hey guys, could you please clarify if there are any updates\progress on this issue? @kmu-leeky were you able to tweak your local code to make it PRable?

nchammas commented 7 years ago

For those still waiting for spark-ec2 to support Spark 2.2, I recommend taking a look at my project, Flintrock. It's basically a faster spark-ec2 with a better user experience.

If anyone does submit a PR adding Spark 2.2 support to spark-ec2, ping me and I'll take a look. Unfortunately, updating the spark-ec2 AMIs to fully support new Spark versions (e.g. adding Java 8) is non-trivial. On Flintrock, you don't need to wait for new commits, AMIs, or branches to be created. You just set an option to pick your version of Spark. Most of the time with Flintrock you can use a new Spark version the day it comes out without any issue.

shivaram commented 7 years ago

+1 to what @nchammas said. We unfortunately do not have bandwidth to create new AMIs / update spark-ec2 to match the Spark releases.

yektayazdani commented 7 years ago

I tried changing the source to use the hadoop 2.7 which the default yarn is used. so once I change it it starts referring to

http://s3.amazonaws.com/spark-related-packages/spark-2.2.0-bin-hadoop2.4.tgz

I tried changing the init.sh in spark folder but for some reason thats not going through. Let me know where I should make the changes and I will add to the source since we need to use this.