Spirals-Team/hadoop-benchmark

Overview

Hadoop-Benchmark is an open-source research acceleration platform for rapid prototyping and evaluation of self-adaptive behaviors in Hadoop clusters. The main objectives are to allow researchers to

− rapidly prototype, i.e., to experiment with self-adaptation in Hadoop clusters without the need to cope with low-level system infrastructure details,

− reproduction, i.e., to share complete experiments for others to reproduce them independently, and

− repetition, i.e., to experiment with and to compare their work, re-doing the same experiments on the same system using the same evaluation methods.

It uses docker and docker-machine to easily create a multi-node cluster (on a single laptop or in a cloud including Grid5000) and provision Hadoop. It contains a number of acknowledged benchmarks and one self-adaptive scenario.

The following is the high-level overview of the created cluster and deployed services: architecture

Requirements

docker >= 1.12
docker-machine >= 0.8
(optional) R >= 3.3.2 with tidyverse and Hmisc for data analysis

Usage

./cluster.sh                                                                              ✓
Usage ./cluster.sh [OPTIONS] COMMAND

Options:

  -f, --force   Use '-f' in docker commands where applicable
  -n, --noop    Only shows which commands would be executed wihout actually executing them
  -q, --quiet   Do not print which commands are executed

Commands:

  Cluster:
    create-cluster
    start-cluster
    stop-cluster
    restart-cluster
    destroy-cluster
    status-cluster

  Hadoop:
    start-hadoop
    stop-hadoop
    restart-hadoop
    destroy-hadoop

  Misc:
    console                   Enter a bash console in a container connected to the cluster
    run-controller CMD        Run a command CMD in the controller container
    hdfs CMD                  Run the HDFS CMD command
    hdfs-download SRC         Download a file from HDFS SRC to current directory

  Info:
    shell-init      Shows information how to initialize current shell to connect to the cluster
                    Useful to execute like: 'eval $(./cluster.sh shell-init)'
    connect-info    Shows information how to connect to the cluster

Documentation

check the tutorial to get started.
check the screencast
check the demonstration of using hadoop-benchmark on Grid5000