amplab / spark-ec2

Scripts used to setup a Spark cluster on EC2
Apache License 2.0
392 stars 299 forks source link

Running Low on Storage when Building Specific Spark Version #17

Open felixmaximilian opened 8 years ago

felixmaximilian commented 8 years ago

Hi,

I am having trouble creating the spark cluster with a custom spark version. I am doing:

ec2/spark-ec2 --key-pair=<key-name> --identity-file=<key-file> --region=eu-west-1 --zone=eu-west-1a --vpc-id=<vpc-id>  --subnet-id=<subnet-id> --copy-aws-credentials --hadoop-major-version=2 --instance-profile-name=<instance-profile-name> --slaves=1 -v 4f894dd6906311cb57add6757690069a18078783 launch cluster_test

-v is using a specific git commit with the given hash (e.g. Spark Version 1.5.1)

When the cluster nodes are started spark is cloned from git (into /root folder) and built. After a while, the script stops because of "no space left on device" warnings. When I login into master and check the space left:

>df
Dateisystem          1K‐Blöcke   Benutzt Verfügbar Ben% Eingehängt auf
/dev/xvda1             8256952   6693968   1479128  82% /
tmpfs                  3816808         0   3816808   0% /dev/shm
/dev/xvdb            433455904   1252616 410184984   1% /mnt
/dev/xvdf            433455904    203012 411234588   1% /mnt2

So there are 1.4 GB left on device, but when trying to download a big file, it fails again with the "no space left on device" message.

I realised that the inodes are the restricting factor here:

df -i
Dateisystem           Inodes   IUsed   IFree IUse% Eingehängt auf
/dev/xvda1            524288  524288       0  100% /
tmpfs                 954202       1  954201    1% /dev/shm
/dev/xvdb            27525120      12 27525108    1% /mnt
/dev/xvdf            27525120      11 27525109    1% /mnt2

Can someone help me increasing the root disk volume? It might be good to increase the standard volume size such that spark can be built.

shivaram commented 8 years ago

Thanks @felixmaximilian for the report. The trouble is that increasing EBS volume size requires AMIs to be rebuilt for all the regions.

One workaround might be to build Spark on the ephemeral disk at /mnt. Could you see if that works and if so we can make a code change for that ?

felixmaximilian commented 8 years ago

Building spark in /mnt/ is possible. Do you plan to copy the compiled spark back to the EBS volume? Then its necessary to make sure you don't copy the whole target folder etc. We should build the distribution (make-distribution.sh) in /mnt/ and than uncompress it back to the EBS volume. What do you think? We could try to jointly find a comfortable solution on Monday. Have a nice weekend.

nchammas commented 8 years ago

Just for the record, I'm running into this issue as well.

@felixmaximilian - Have you made any progress on solving this? I can help you write a patch, if you are interested in writing one.

felixmaximilian commented 8 years ago

A colleague created an ami with much more (I guess ebs) space on the main partition (root). Haven't really tried it again but it should be solved though.

Fixing this problem within the code of ec2 spark wasn't very much successful on my side. Tried different things but ended with the problem that you cannot really do much on the externally mounted mnt2 mnt3 etc while starting the cluster because they are added and removed during the process. Didn't really get it why. (Idea was to build it on external storage and to copy it back to root then). We can give it another try with combined forces :)

But another question : is 8gb on the root partition really enough if just the installation files fit there!? What is about the hdfs in the ephemeral folder? As far as I can remember this is also existing in root which means we can hardly save anything to the hdfs, right? It might be worth to resize all the Amis to a bigger partition or have at least another partition from the very beginning to be able to do stuff there. Nicholas Chammas notifications@github.com schrieb am Do., 5. Nov. 2015 um 20:51:

Just for the record, I'm running into this issue as well.

@felixmaximilian https://github.com/felixmaximilian - Have you made any progress on solving this? I can help you write a patch, if you are interested in writing one.

— Reply to this email directly or view it on GitHub https://github.com/amplab/spark-ec2/issues/17#issuecomment-154171456.

nchammas commented 8 years ago

Hmm, anything that requires updating all the spark-ec2 AMIs is a tough sell since that takes a lot of work and the process is not automated.

shivaram commented 8 years ago

Yeah just to clear some things - AFAIK to increase root partition size needs an AMI rebuild. However I think we should be able to clone and build Spark on /mnt using make-distribution.sh and then unzip to the root partition.

The HDFS thing is not really an issue -- the HDFS binaries are on /root but it uses /mnt on every machine for storage, so it can use all the ephemeral storage.

nchammas commented 8 years ago

I think this issue can be resolved without having to do any work on the AMIs. See this comment.

tartavull commented 8 years ago

+1