S. = Spark
H. = Hatch
M. = Mesos
A. = Akka
C. = Cassandra
K. = Kafka
SHMACK is open source under terms of Apache License 2.0 (see License Details). For now, it provides a quick start to set up a Mesos cluster with Spark and Cassandra on Amazon Web Services (AWS) using Mesosphere DC/OS template, with the intention to cover the full SMACK stack (Spark, Mesos, Akka, Cassandra, Kafka - also known as Mesosphere Infinity stack) and being enriched by Hatch applications (closed source).
When setting up the tutorial servers on Amazon AWS and letting them running, there will be monthly costs of approx 1700 $ ! Please make sure that servers are only used as required. See FAQ section in this document.
Don't get scared too much - for temporary use, this is fine as 1700$ per month is still less than 60$ a day. If the days are limited, e.g. for just a few days of experimentation, than this is fine - but better keep an eye on your AWS costs. For production, there would be many things needed to be done first anyway (see Limitations) - so running costs would be a rather minor issue.
Everything can be performed free of charge until you start up nodes in the cloud (called Stack creation).
If you have existing accounts, they can be used. If not:
You will need a (for now) a Linux machine to control and configure the running SHMACK stack. You will also need that in order to develop and contribute.
sudo apt-get install git
mkdir ${HOME}/shmack && cd ${HOME}/shmack && git clone https://github.com/Zuehlke/SHMACK.git shmack-repo
cd ${HOME}/shmack/shmack-repo/scripts && sudo -H bash ./setup-ubuntu.sh
setup-ubuntu.sh
and see yourself what is missing - and install only missing bits.${HOME}/.profile
PATH=${PATH}:${HOME}/shmack/shmack-repo/scripts
export PATH
Details can be found in: http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html
shmack
aws configure
AWS Access Key ID [None]: [from browser page]
AWS Secret Access Key [None]: [from browser page]
Default region name [None]: us-west-1
(VERY important, DO NOT change this!)Default output format [None]: json
shmack
(Tab "Permissions", --> "Attach Policy" --> "Administrator Access"): https://console.aws.amazon.com/iam/home?#users/shmack
shmack-key-pair-01
Attention: Use exactly this Key pair name as it is referenced in the scripts!mkdir ${HOME}/.ssh
gedit ${HOME}/.ssh/shmack-key-pair-01.pem
Attention: Use exactly this filename as it is referenced in the scripts!chmod 600 ${HOME}/.ssh/shmack-key-pair-01.pem
ssh-add ${HOME}/.ssh/shmack-key-pair-01.pem
aws configure
,
so you don't need to store them for SHMACK.
The only case in which you may need some AWS credentials otherwise, may probably be to copy data from S3.
Make sure you always perform this as one-time operations you do not need to commit! Eclipse IDE for Java EE Developers
as 64 bit for Linux from https://www.eclipse.org/downloads/ cd ${HOME}; tar xvfz Downloads/eclipse-jee-neon-R-linux-gtk-x86_64.tar.gz
eclipse/eclipse
Help --> Eclipse Marketplace
Gradle IDE Pack
${HOME}/shmack/shmack-repo/
into eclipse
http://download.eclipse.org/technology/dltk/updates-dev/latest/
http://download.scala-ide.org/sdk/lithium/e44/scala211/stable/site
/usr/lib/jvm/java-1.8.0-openjdk-amd64
${HOME}/shmack/shmack-repo/
Ok
-> And Add root
when promptedMesosphere provides AWS CloudFormation templates to create a stack with several EC2 instances in autoscaling groups, some of directly accessible (acting as gateways), others only accessible through the gateway nodes. See DC/OS Network Security Documentation for details.
The scripts for SHMACK will not only create/delete such a stack, but also maintain the necessary IDs to communicate and setup DC/OS packeges to form SHMACK. It therefore makes the process described in https://mesosphere.com/amazon/setup/ even simpler and repeatable, and by that, more appropriate for forming short-lived clusters for quick experiments or demonstrations.
To lower costs you can use spot instances. To do this, change this line in shmack_env:
TEMPLATE_URL="https://s3-us-west-1.amazonaws.com/shmack/single-master.cloudformation.spot.json"
This is currently hosted on a private s3 bucket, for details see here.
${HOME}/shmack/shmack-repo/scripts/create-stack.sh
Go to the following link in your browser:
and enter verification code.
Whenever you are asked to authenticate using a cloud provider, it works well to use your Github account.create-stack.sh
Continue installing? [yes/no]
--> yes${HOME}/shmack/shmack-repo/scripts/ssh-into-dcos-master.sh
and ${HOME}/shmack/shmack-repo/scripts/ssh-into-dcos-slave.sh 0
Ctrl-d
or type exit
twice)${HOME}/shmack/shmack-repo/scripts/delete-stack.sh
eu-central-1
(Frankfurt). Recommended region to try is us-west-1
. Take care of regulatory issues (physical location of data) when thinking about a real productive System.Also check out spot instances to reduce costs. Check regularly the Billing and Cost Dashboard, which Amazon will update daily. You also install the AWS Console Mobile App to even have an eye on the running instances and aggregated costs no matter where you are - and take actions if needed like deleting a running stack.
To not constantly poll the costs, set up a billig alert.
And then: be careful when to start and stop the AWS instances. As of 2015-10-23 there is no officially supported way to suspend AWS EC2 instances. see Stackoverflow and Issue
The only official supported way to stop AWS bills is to completely delete the stack. ATTENTION:
And make sure, you keep your credentials for AWS safe!
shmack-key-pair-01.pem
, although this will usually only grant access to your running instances; not the option to run new instances and thus create additional costs.In principle, you can. But be aware that you may block each other with running tasks.
shmack-key-pair-01.pem
, AWS Access Key ID, and AWS Secret Access Keycapture-stack-state.sh
and distribute the generated file stack-state.tgz
populate-copied-stack-state.sh
to make use of the shared clusterThat changes constantly as Mesosphere adds packages to DC/OS. And we provide our own.
dcos package search
to get the current list for your configured repositories or open-shmack-marathon-ui.sh
and select Universe on the left.Into the 03_analysis_design/Issues
folder, see https://github.com/Zuehlke/SHMACK/tree/master/03_analysis_design/Issues
<git-repo-root>
|- 03_analysis_design
|- Issues
|- Issue-<ID> - <any short description you like>
|- Any files you like to work on
${HOME}/shmack/shmack-repo/scripts/change-number-of-slaves.sh <new number of slaves>
Attention: Data in HDFS is destroyed when scaling down!!
As of 2016-08-26 Java 1.8.0_51, Spark 2.0 with Scala 2.11.8, and Python 3.4 are deployed on the created stack.
Not really. Officially, you should use graphical webfrontends Zeppelin or Spark Notebook instead. An older blog posting showed some steps, but that never really worked for anything with parallel execution / using the master.
open-shmack-master-console.sh
and see if all services are healthy.
Unfortunately, due to issue #16 tests that require RSync no longer work,
and that includes most of the infrastructure tests.
Once this is fixed, you may execute the testcase ShmackUtilsTest
in your IDE.
This will run some basic tests to check that your local setup is fine and can properly make use of a running stack in the AWS cloud.
If this testcase fails: see here
Look at the examples:
JavaSparkPiLocalTest
WordCountLocalTest
These will not require any running stack and can therefore also be performed without any instance on AWS.Look at the examples:
JavaSparkPiRemoteTest
WordCountRemoteTest
These will require a running stack, they will fail if instance on AWS are not yet (or no longer) available or cannot be access through SSH.THIS CURRENTLY DOESN'T WORK BECAUSE UPLOAD VIA RSYNC IS BROKEN
Make sure that every thecase hase it's own testcaseId
. This id needs only to be distinct only within one Test-Class.
String testcaseId = "WordCount-" + nSlices;
RemoteSparkTestRunner runner = new RemoteSparkTestRunner(JavaSparkPiRemoteTest.class, testcaseId);
To execute the tests do the following:
gradle fatJarWithTests
(available as eclipse launch configuration)src/test/resources
on a remote Spark instance?Synchronize the src/test/resources
to the HDFS filesystem
RemoteSparkTestBase#syncTestRessourcesToHdfs()
WordCountRemoteTest#testWordcountRemote()
rsync
only changes will be transferred from your laptop to the EC2 instance. This saves huge amounts of time and bandwith ;-). Nevertheless there is no efficient way to sync between the EC2-Master node and the EC2-HDFS Filesystem. But this should be no problem as bandwidth within the EC2 cluster is very high.Use the required ressource from within the Spark-Job (executed remotely) addressedb by the HDFS-URL , e.g.
see WordCountRemoteTest#main():
// resolves to hdfs://hdfs/spark-tests/resources/tweets/tweets_big_data_2000.json
String inputFile = getHdfsTestRessourcePath("tweets/tweets_big_data_2000.json");
Use the RemoteSparkTestRunner#
getRemoteResult()
as follows:
executeSparkRemote(String...)
waitForSparkFinished()
getRemoteResult()
Examples:
JavaSparkPiRemoteTest
WordCountRemoteTest
The RemoteSparkTestRunner#executeWithStatusTracking()
is to be invoked by the spark Job. It writes the state of the spark job to the HDFS filesystem
The JUnit test uses the RemoteSparkTestRunner
to poll the state, see RemoteSparkTestRunner#waitForSparkFinished()
.
Examples:
JavaSparkPiRemoteTest
WordCountRemoteTest
Nevertheless it can happen that due to a severe error, that the status in HDFS is not written. In this case see here
Use methods provided by ShmackUtils
:
runOnMaster(CommandLine, ExecExceptionHandling)
runOnMaster(ExecExceptionHandling, String, String...)
runOnMaster(String, String...)
These methods will typically throw an exception if the return code is not 0 (can be controlled using ExecExceptionHandling).
You can do this ...
... either locally from your laptop:
from JUnit Tests: use method provided by ShmackUtils
, e.g.
copyFromHdfs(File, File)
copyToHdfs(File, File)
syncFolderToHdfs(File, File)
syncFolderFromHdfs(File, File)
deleteInHdfs(File)
getHdfsURL(File)
readByteArrayFromHdfs(File)
readStringFromHdfs(File)
Note that you can simply use a java.io.File to address files in HDFS, e.g. /foo/bar.txt
will be written to the HDFS URL hdfs://hdfs/foo/bar.txt
from a bash:
copy-from-hdfs.sh
copy-to-hdfs.sh
sync-from-hdfs-to-local.sh
sync-to-hdfs.sh
... or from a Spark-Job executed remote in the EC2 cluster:
com.zuehlke.shmack.sparkjobs.base.HdfsUtils
SignatureDoesNotMatch
error in aws-cli.In detail, stack operation reports somethning like:
A client error (SignatureDoesNotMatch) occurred when calling the CreateStack operation: Signature expired: 20160315T200648Z is now earlier than 20160316T091536Z (20160316T092036Z - 5 min.)
Likely the clock of your virtual maching is wrong.
To fix this:
Just to be on the safe side, you should probably also update the AWS Commandline Interface:
sudo -H pip install --upgrade awscli
create-stack
fails with some message I should run dcos auth login
Don't worry. This happens sometimes when you have created a DC/OS stack before and the credentials no longer fit. It is a pain, but very easy to fix.
To fix this:
dcos auth login
, login with a valid account (GitHub works great), copy the activation code, and paste to the shell.init-dcos-stack.sh
to complete the creation of the full stack.You can always setup new credentials without needing to setup a new account, so this is no big deal:
aws configure
again using the new credentialsIn most cases the reason for this is that ssh is blocked by corporate networks.
Solution: Unplug network cable and use zred
WiFi.
Execute the testcase ShmackUtilsTest
in eclipse.
If this testcase fails: see here
Be sure to have a stack created successfully and confirmed idendity of hosts, see here
${HOME}/shmack/repo/04_implementation/scripts/open-shmack-mesos-console.sh
sandbox
of your spark-jobstderr
ssh-into-dcos-master.sh
hadoop fs -rm -r -f 'hdfs://hdfs/*'
Are you running Ubuntu 16.04? Because there is a known issue of SWT not working properly on GTK3: http://askubuntu.com/questions/761604/eclipse-not-working-in-16-04
Follow the log of a service like this:
dcos service log --follow hdfs
You will see the same piece of log being logged over and over again. Analyze it (look for "failed" or similar).
ssh-into-slave <num>
)dmesg
Copyright 2016
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.