TheClimateCorporation / lemur

Lemur is a tool to launch hadoop jobs locally or on EMR, based on a configuration file, referred to as a jobdef. The jobdef file describes your EMR cluster, local environment, pre- and post-actions and zero or more "steps".
Apache License 2.0
86 stars 20 forks source link

Overview

Lemur is a tool to launch hadoop jobs locally or on EMR, based on a configuration file, referred to as a jobdef. The jobdef file describes your EMR cluster, local environment, pre- and post-actions (aka hooks) and zero or more "steps". A step is Amazon's name for a task or job submitted to the cluster. Lemur reads your jobdef, at the end of your jobdef, you execute (fire! ...) to make things happen. Also keep in mind that the jobdef is an interpreted clj file, so you can insert arbitrary Clojure code to be executed anywhere in the file (but see HOOKS below for a better way).

Features

A Note About the Ruby elastic-mapreduce CLI tool

Lemur does not try to replace elastic-mapreduce. While there is some overlap, lemur is focused on launching. It provides no replacement for many common activities that you will find in elastic-mapreduce. For example, "elastic-mapreduce --list". I recommend that you install elastic-mapreduce along-side lemur (or rely on the AWS Console for those activities).

Installation

  1. Download the latest tar-gzip (.tgz) from http://download.climate.com/lemur/releases/lemur-1.4.6.tgz
  2. Expand into some install location
  3. set LEMUR_HOME to the top of the install path
  4. cd $LEMUR_HOME
  5. lein jar # assuming you have leiningen installed and on classpath
  6. set LEMUR_EXTRA_CLASSPATH to any classpath entries (colon separated) that you want lemur to include when it runs your jobdef. The classpath that includes you base files, or other functions or libraries for use by your jobdefs for example.

AWS Credentials

Lemur uses DefaultAWSCredentialsProviderChain to gather AWS credentials to access various AWS services.

Compatibility

v0.9.7 Clojure 1.2

v1.0.1+ Clojure 1.3

v1.4.0+ Clojure 1.5

I've used lemur on Mac OS X and Linux. It MAY work on Windows (if you use cygwin). If you try it on Windows, I would be interested in hearing about your experience (patches welcome).

Usage

The general command line format is:

bin/lemur <command> <jobdef-file> [options] [remaining]

bin/lemur help                    - display this help text
bin/lemur run ./jobdef.clj        - Run a job on EMR
bin/lemur dry-run ./jobdef.clj    - Dry-run, i.e. just print out what would be done
bin/lemur start ./jobdef.clj      - Start an EMR cluster, but don't run the steps (jobs)
bin/lemur local ./jobdef.clj      - Run the job using local hadoop (e.g. standalone mode)
bin/lemur submit ./jobdef.clj --jobflow j-123456789  - Submit steps to an existing jobflow (running cluster)
Examples
lemur run clj/wb-clj/scripts/launch/hrap-jobdef.clj --dataset ahps --num-days 10
lemur start clj/wb-clj/src/weatherbill/lemur/sample-jobdef.clj

Help

Feedback and feature requests are welcome!