amplab / spark-ec2

Scripts used to setup a Spark cluster on EC2
Apache License 2.0
392 stars 299 forks source link

cluster setup error: unknown spark version #43

Open DSLituiev opened 8 years ago

DSLituiev commented 8 years ago

I faced following issue while running ./spark-ec2 --key-pair=<> --identity-file=<> --region=us-west --instance-type=t2.micro -s 2 launch test-cluster:

[...]
Initializing spark
--2016-07-28 03:58:47--  http://s3.amazonaws.com/spark-related-packages/spark-1.6.2-bin-hadoop1.tgz
Resolving s3.amazonaws.com (s3.amazonaws.com)... 54.231.40.74
Connecting to s3.amazonaws.com (s3.amazonaws.com)|54.231.40.74|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2016-07-28 03:58:47 ERROR 404: Not Found.

ERROR: Unknown Spark version
spark/init.sh: line 137: return: -1: invalid option
return: usage: return [n]
Unpacking Spark
tar (child): spark-*.tgz: Cannot open: No such file or directory
tar (child): Error is not recoverable: exiting now
tar: Child returned status 2
tar: Error is not recoverable: exiting now
rm: cannot remove `spark-*.tgz': No such file or directory
mv: missing destination file operand after `spark'
Try `mv --help' for more information.
[...]
DSLituiev commented 8 years ago

Seemingly there is no archive for spark-1.6.2-bin-hadoop1.tgz there. Using --hadoop-major-version 2 leads to the following:

2016-07-28 04:53:42 (3.83 MB/s) - ‘spark-1.6.2-bin-hadoop2.4.tgz’ saved [273797124/273797124]

Unpacking Spark
[timing] spark init:  00h 01m 12s
Initializing ephemeral-hdfs
ERROR: Unknown Hadoop version
[timing] ephemeral-hdfs init:  00h 00m 00s
Initializing persistent-hdfs
ERROR: Unknown Hadoop version
[timing] persistent-hdfs init:  00h 00m 00s
Initializing mapreduce
ERROR: Unknown Hadoop version
mapreduce/init.sh: line 20: return: -1: invalid option
return: usage: return [n]
File or directory /root/mapreduce doesn't exist!
[timing] mapreduce init:  00h 00m 00s
Initializing spark-standalone
shivaram commented 8 years ago

Yeah it looks like that file is missing -- @JoshRosen can you help in uploading spark-1.6.2-bin-hadoop1.tgz to the S3 bucket spark-related-packages ?

omdv commented 8 years ago

@DSLituiev try --hadoop-major-version=2, it worked fine for me.

I don't know if file have been uploaded, but i noticed that the name changed to spark-1.6.2-bin-hadoop1-scala2.11.tgz on https://www.apache.org/dist/spark/spark-1.6.2/

DSLituiev commented 8 years ago

@omdv: As I mentioned above (assuming nothing changed in code or repositories): Using --hadoop-major-version 2 leads to the following:

2016-07-28 04:53:42 (3.83 MB/s) - ‘spark-1.6.2-bin-hadoop2.4.tgz’ saved [273797124/273797124]

Unpacking Spark
[timing] spark init:  00h 01m 12s
Initializing ephemeral-hdfs
ERROR: Unknown Hadoop version
[timing] ephemeral-hdfs init:  00h 00m 00s
Initializing persistent-hdfs
ERROR: Unknown Hadoop version
[timing] persistent-hdfs init:  00h 00m 00s
Initializing mapreduce
ERROR: Unknown Hadoop version
mapreduce/init.sh: line 20: return: -1: invalid option
return: usage: return [n]
File or directory /root/mapreduce doesn't exist!
[timing] mapreduce init:  00h 00m 00s
Initializing spark-standalone
omdv commented 8 years ago

@DSLituiev could you try it with = sign?

DSLituiev commented 8 years ago

@omdv: it works. Thank you!