amplab / spark-ec2

Scripts used to setup a Spark cluster on EC2
Apache License 2.0
391 stars 298 forks source link

Documentation incorrect regarding missing "ec2" directory #89

Open matthewadams opened 7 years ago

matthewadams commented 7 years ago

The documentation appears to be incorrect in at least the branch-1.6 & branch-2.0 branches. At https://github.com/amplab/spark-ec2#launching-a-cluster, the doc says "Go into the ec2 directory in the release of Apache Spark you downloaded." Problem is, there is no ec2 directory in the Spark distribution.

http://stackoverflow.com/a/38882774/969237 says "Download the official ec2 directory as detailed in the Spark 2.0.0 documentation." (in Edit 2). Problem is, the official Spark documentation (now at 2.1), at http://spark.apache.org/docs/latest/, links to https://github.com/amplab/spark-ec2, which takes me right back here. No help.

I'm suspecting that what was formerly the ec2 directory in an Apache Spark distribution is now the root directory of https://github.com/amplab/spark-ec2, but I'm not familiar enough with this stuff to know.

Please update the documentation so that I can follow the installation instructions.

shivaram commented 7 years ago

Yes - the contents of EC2 directory in Spark is now in the root of this repository. Would you be interested in opening a PR updating the documentation ?

aditya-mittal commented 7 years ago

@matthewadams You got any resolution for your problem above?

aditya-mittal commented 7 years ago

As correctly pointed out by @matthewadams, the ec2 folder is missing in the spark 2.2.0. Well, the ec2 folder was present in previous versions of spark.

Solution: Create a directory named ec2 inside downloaded spark and then clone this repository (https://github.com/amplab/spark-ec2) inside ec2 directory of downloaded spark.