Zuehlke / SHMACK

Automation and convenience in for setting up a S.H.M.A.C.K. stack with DC/OS on AWS
Apache License 2.0
18 stars 8 forks source link

S.H.M.A.C.K

S. = Spark

H. = Hatch

M. = Mesos

A. = Akka

C. = Cassandra

K. = Kafka

A modern stack for Big Data applications

SHMACK is open source under terms of Apache License 2.0 (see License Details). For now, it provides a quick start to set up a Mesos cluster with Spark and Cassandra on Amazon Web Services (AWS) using Mesosphere DC/OS template, with the intention to cover the full SMACK stack (Spark, Mesos, Akka, Cassandra, Kafka - also known as Mesosphere Infinity stack) and being enriched by Hatch applications (closed source).

WARNING: things can get expensive $$$$$ !

When setting up the tutorial servers on Amazon AWS and letting them running, there will be monthly costs of approx 1700 $ ! Please make sure that servers are only used as required. See FAQ section in this document.

Don't get scared too much - for temporary use, this is fine as 1700$ per month is still less than 60$ a day. If the days are limited, e.g. for just a few days of experimentation, than this is fine - but better keep an eye on your AWS costs. For production, there would be many things needed to be done first anyway (see Limitations) - so running costs would be a rather minor issue.

Vision

Installation

Everything can be performed free of charge until you start up nodes in the cloud (called Stack creation).

Register accounts (as needed)

If you have existing accounts, they can be used. If not:

Development Environment setup

You will need a (for now) a Linux machine to control and configure the running SHMACK stack. You will also need that in order to develop and contribute.

Create a Virtual Machine

In the Virtual machine

Setup AWS console

Details can be found in: http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html

Download, install, and configure Eclipse for the use in SHMACK

Optional: Use IntelliJ IDEA for SHMACK

Stack Creation and Deletion

Mesosphere provides AWS CloudFormation templates to create a stack with several EC2 instances in autoscaling groups, some of directly accessible (acting as gateways), others only accessible through the gateway nodes. See DC/OS Network Security Documentation for details.

The scripts for SHMACK will not only create/delete such a stack, but also maintain the necessary IDs to communicate and setup DC/OS packeges to form SHMACK. It therefore makes the process described in https://mesosphere.com/amazon/setup/ even simpler and repeatable, and by that, more appropriate for forming short-lived clusters for quick experiments or demonstrations.

Optional: Use spot instances

To lower costs you can use spot instances. To do this, change this line in shmack_env:

TEMPLATE_URL="https://s3-us-west-1.amazonaws.com/shmack/single-master.cloudformation.spot.json"

This is currently hosted on a private s3 bucket, for details see here.

Stack Creation (from now on, you pay for usage)

Stack Deletion

Affiliate

Links

Important Limitations / Things to consider before going productive

FAQ

How do I avoid to be surprised by a monthly bill of 1700 $ ?

Also check out spot instances to reduce costs. Check regularly the Billing and Cost Dashboard, which Amazon will update daily. You also install the AWS Console Mobile App to even have an eye on the running instances and aggregated costs no matter where you are - and take actions if needed like deleting a running stack.

To not constantly poll the costs, set up a billig alert.

And then: be careful when to start and stop the AWS instances. As of 2015-10-23 there is no officially supported way to suspend AWS EC2 instances. see Stackoverflow and Issue

The only official supported way to stop AWS bills is to completely delete the stack. ATTENTION:

And make sure, you keep your credentials for AWS safe!

Can I share a running cluster with other people I work with to reduce costs?

In principle, you can. But be aware that you may block each other with running tasks.

What components are available?

That changes constantly as Mesosphere adds packages to DC/OS. And we provide our own.

Where do I put my notes / non-implementation files when working on an issue (including User-Stories) ?

Into the 03_analysis_design/Issues folder, see https://github.com/Zuehlke/SHMACK/tree/master/03_analysis_design/Issues

<git-repo-root>
  |- 03_analysis_design
     |- Issues
        |- Issue-<ID> - <any short description you like>
           |- Any files you like to work on

How do I scale up/down the number of slave nodes?

${HOME}/shmack/shmack-repo/scripts/change-number-of-slaves.sh <new number of slaves> Attention: Data in HDFS is destroyed when scaling down!!

Which Java/Scala/Python Version can be used?

As of 2016-08-26 Java 1.8.0_51, Spark 2.0 with Scala 2.11.8, and Python 3.4 are deployed on the created stack.

Can I run an interactive Spark Shell?

Not really. Officially, you should use graphical webfrontends Zeppelin or Spark Notebook instead. An older blog posting showed some steps, but that never really worked for anything with parallel execution / using the master.

What should I do to check if the setup and stack creation was successful?

open-shmack-master-console.sh and see if all services are healthy.

Unfortunately, due to issue #16 tests that require RSync no longer work, and that includes most of the infrastructure tests. Once this is fixed, you may execute the testcase ShmackUtilsTest in your IDE. This will run some basic tests to check that your local setup is fine and can properly make use of a running stack in the AWS cloud. If this testcase fails: see here

How can I execute Unit-Tests on a local Spark instance?

Look at the examples:

How can I execute Unit-Tests on a remote Spark instance (i.e. in the Amazon EC2 cluster)?

Look at the examples:

THIS CURRENTLY DOESN'T WORK BECAUSE UPLOAD VIA RSYNC IS BROKEN

Make sure that every thecase hase it's own testcaseId. This id needs only to be distinct only within one Test-Class.

String testcaseId = "WordCount-" + nSlices;
RemoteSparkTestRunner runner = new RemoteSparkTestRunner(JavaSparkPiRemoteTest.class, testcaseId);

To execute the tests do the following:

How can I use src/test/resources on a remote Spark instance?

How can I retrieve the results from a Unit-Test executed on a remote Spark instance?

Use the RemoteSparkTestRunner#getRemoteResult() as follows:

Examples:

How does the JUnit-Test know when a Spark-Job is finished?

The RemoteSparkTestRunner#executeWithStatusTracking() is to be invoked by the spark Job. It writes the state of the spark job to the HDFS filesystem The JUnit test uses the RemoteSparkTestRunner to poll the state, see RemoteSparkTestRunner#waitForSparkFinished().

Examples:

Nevertheless it can happen that due to a severe error, that the status in HDFS is not written. In this case see here

How can I execute command in the EC2 cluster from a local JUnit Test?

Use methods provided by ShmackUtils:

These methods will typically throw an exception if the return code is not 0 (can be controlled using ExecExceptionHandling).

How do I read / write files from / to the HDFS file system in the EC2 cluster?

You can do this ...

Troubleshooting

I get a SignatureDoesNotMatch error in aws-cli.

In detail, stack operation reports somethning like: A client error (SignatureDoesNotMatch) occurred when calling the CreateStack operation: Signature expired: 20160315T200648Z is now earlier than 20160316T091536Z (20160316T092036Z - 5 min.)

Likely the clock of your virtual maching is wrong.

To fix this:

Just to be on the safe side, you should probably also update the AWS Commandline Interface:

create-stack fails with some message I should run dcos auth login

Don't worry. This happens sometimes when you have created a DC/OS stack before and the credentials no longer fit. It is a pain, but very easy to fix.

To fix this:

I forgot my AWS credentials / closed the browser too early.

You can always setup new credentials without needing to setup a new account, so this is no big deal:

What should I do if the setup of the stack has failed?

What should I do if ssh does not work?

In most cases the reason for this is that ssh is blocked by corporate networks. Solution: Unplug network cable and use zred WiFi.

What should I do to check if the setup was successful?

Execute the testcase ShmackUtilsTest in eclipse. If this testcase fails: see here

What should I do if Integration testcases do not work?

Be sure to have a stack created successfully and confirmed idendity of hosts, see here

What should I do if Spark-Jobs are failing?

To start with a clean state, you may delete the whole HDFS Filesystem as follows

ssh-into-dcos-master.sh hadoop fs -rm -r -f 'hdfs://hdfs/*'

Eclipse freezes with no apparent reason?

Are you running Ubuntu 16.04? Because there is a known issue of SWT not working properly on GTK3: http://askubuntu.com/questions/761604/eclipse-not-working-in-16-04

Some of my dcos services are stuck in deployment

Follow the log of a service like this: dcos service log --follow hdfs You will see the same piece of log being logged over and over again. Analyze it (look for "failed" or similar).

Your spark driver or executors are being killed


License Details

Copyright 2016

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.