Teradata / stampede

Stampede is the workflow tool for *nix that Cthulhu intended. It uses make for dependencies, bash for scripting, and cron for scheduling.
Other
33 stars 9 forks source link

Stampede README

Dean Wampler
dean.wampler@thinkbiganalytics.com
@StampedeWkFlow
January 8, 2013

Copyright (c) 2011-2013, Think Big Analytics, Inc. All Rights Reserved.

Welcome to Stampede, the workflow tool that works as Cthulhu intended for *nix systems, using make for dependency management and task seqeuencing, bash for scripting, and cron for scheduling.

Stampede originated as an alternative workflow tool for Hadoop, but it is not limited to Hadoop scenarios.

If you like Stampede, please consider joining the stampede-users Google group and following us on Twitter @StampedeWkFlow. Also, contributions in the form of patches are always welcome and appreciated.

Installation

First, clone this repo or expand the distribution archive somewhere useful, e.g., $HOME/stampede.

Since Stampede uses make and bash as its weapon's of choice, run this make command to test Stampede on your system and then install it:

make test install

The tests target is not required, but we recommend it as a sanity check for your environment. The install target will ask you for details like the target installation directory (the default is /usr/local/stampede).

If you don't have syslog on your system, run this command instead, which will skip the syslog-related tests:

make test-core install

Finally, the test target does not test the "extras" included with Stampede, currently limited to Hadoop-specific tools. To test these tools, first ensure that $HADOOP_HOME is defined, then run this command:

make test-extras install

The install target installs everything, whether you want to use syslog and the "extras" or not. They are small and harmless, if left alone in a cold, dark room... ;^)

Next, assuming you installed in /usr/local/stampede/, which we'll call $STAMPEDE_HOME from now on, add $STAMPEDE_HOME/bin to the PATH for any user who plans to use Stampede. Also, the installation will include *nix man pages, so add $STAMPEDE_HOME/man to the MANPATH.

As part of the installation, the installer will ask you if you want a global stampederc file installed in /etc, /etc/sysconfig, or somewhere else. All statements in this file are commented out. If you want to make global changes to Stampede's environment variables, edit this file appropriately. Note these "rc" files won't contain all the possible variables you can define, see $STAMPEDE_HOME/bin/env.sh for the complete list of variables, their default values, and comments that describe them.

Similarly, if you told the installer to copy stampederc file to $HOME/.stampederc, edit that file for your personal tasks.

Whenever you create a new Stampede project, it will also get its own $PROJECT_HOME/.stampederc file, as we discuss next.

Building Java Components

As of this release, there is a small Hadoop application written in Java, in src/hadop/mapreduce-configuration. It is used by the bin/hadoop/mapreduce-prop command. For your convenience, a pre-built jar file is already provided. However, it is built with Java 1.6 (for maximum portability) and Hadoop v1.0.3. So, you may need to rebuild it if you use a different version of Hadoop or you want to use a newer version of Java. See src/hadop/mapreduce-configuration/README.md for details.

Usage

An individual workflow definition is called a stampede.

To create a stampede, run the following command:

stampede create

It will prompt you for properties such as the name of the stampede and the project's working directory.

Edit the .stampederc and makefile created in the project directory to define your workflow. See the $STAMPEDE_HOME/examples for ideas. Note that $STAMPEDE_HOME/bin contains helper scripts to ease the development of workflows. See also Make and Bash Notes in this directory for some tips.

Once a stampede has been created, you can invoke it using this command:

stampede -f /path/to/makefile [options] [make_targets]

For help on the stampede options:

stampede --help

Required Tools

Stampede is mostly agnostic to tool versions. For any particular tool, including its own scripts, Stampede relies on finding the tool in the user's PATH.

Supported Platforms

Planned Support

Currently, cygwin and similar "Unix on Windows" toolkits are not supported, but only because we haven't tried them. We have tried to avoid any assumptions that would preclude this support. We welcome patches!

Note that as of this writing, support for running Hadoop in Windows environments was just recently announced.

Manifest

The top-level directory contains the following files, in addition to directories that will be described next:

Bin Directory

Stampede supplies helper bash scripts in the bin directory and "extras" for specific applications (e.g., Hadoop) in subdirectories. All the scripts that end with .sh are used internally by Stampede. The files without this extension are user-callable utilities for building workflows.

NOTE: All of these tools assume that $STAMPEDE_HOME is defined. This is true when they are called in a stampede workflow, e.g., a Makefile.

bin Utilities

Briefly, here are the utilities in the bin directory. All support a --help option for more information:

The following "helper" files are used by these scripts:

bin/hadoop Utilities

Hadoop-specific helper tools are in the bin/hadoop directory. As for the bin scripts, use --help for more information on each tool.

Note: While the *-prop utilities behave similarly and take similar arguments, there are some differences in the results they produce that reflect differences in how they were implemented. See their *-prop --help messages or man pages for details.

Custom and Contrib Directories

If you want to override the behavior of any particular script, drop a new version in the custom directory (or a subdirectory), which are added to the PATH first.

We intend for contrib to be a place where unsupported, community-contributed tools will go. This directory and any subdirectories will also be added to the path, after custom and bin.

Example Directory

The example directory contains example stampedes that you can adapt for your purposes as well as a sample configuration file.

Test Directory

Tests of Stampede itself are in the test directory. The tests provide good examples of the individual tools in action. To execute the tests, run make test. This make target won't run the "extras" tests, e.g., for Hadoop. To run all tests, run make test-with-extras.

Notes