dakusui / bredxbred

mapreduce in bash
Other
4 stars 1 forks source link

bredxbred: bred and xbred

bredxbred is a project to develop an easy map reduce framework where you can define map and reduce jobs with your daily tools like bash, awk, perl, etc.

It consists of two components,

Requirements

Platform

bred and xbred are tested on following platforms.

Software dependencies

Environment

Installation

Setting up an environment (on each machine)

Build a utility program: brp

Build a utility program brp by running make command in utils directory.


    $ cd utils
    $ make

Place files somewhere handy in your PATH

Should work with non-interactive shell usage as well, where .bashrc is not parsed! Recommended directory structure is shown below. Make sure each of them has a correct permission (shown in parentheses)


    /usr/local
        bin/
            bred.conf (644)
                bred (744)
            bred-core (644)
            brp (755)
            xbred (755)

Configure the environment

Following is the content of bred.conf. Basically you can use it without modifying it if you just want to exploit multi-core benefit of your local machine.


    baseport=10000
    hosts=(localhost localhost localhost localhost)
    namenode="${hosts[0]}"
    workdir="/tmp/bred"
    fsdir="${workdir}/fs"
    jmdir="${workdir}/jm"
    sorttmpdir="${workdir}/sort"
    sortmem["${namenode}"]="32M"
    defaultsortmem="256K"

Create directories

Create fsdir and jmdir and make sure they are writable by the user by whom you are going to execute bred and xbred.

Write your own map reduce program

Following is a 'word count' example written in xbred style.


    #!/usr/local/bin/xbred

    ####
    #          Id: main
    #        Type: map
    # Interpreter: sh
    #         Key: 1
    #       Sinks: wordcount
    function map map(awk -f,1,wordcount) inline:<<EOF
      {
        gsub(/([[:punct:]]|[[:blank:]])+/, " ", $0);
        n=split($0,cols," ");
        for (i = 1; i <= n; i++) { print cols[i]; };
      }
    EOF

    ####
    #          Id: wordcount
    #        Type: reduce
    # Interpreter: awk -f
    #         Key: 1
    #       Sinks: -
    function reduce wordcount(awk -f,1,-) inline:<<EOF
      BEGIN {
        c=0;
      }
      {
        if (key == "") key=$1;
        c++;
      }
      END {
        print "" key " " c;
      }
    EOF

Run it with: cat input.txt | ./word_count.xbred > output.txt

Refer to XBRED for more details. You can find more examples under examples directory.

Future works

Author

See also