dakusui / bredxbred

mapreduce in bash
Other
4 stars 1 forks source link

Create 'program distribution' mechanism #4

Closed dakusui closed 9 years ago

dakusui commented 9 years ago

Instead of moving aroung data, transmitting a processing program is an approach of map reduce. The processing program bashreduce is relying on is essentially a bash one liner. But quoting/escaping hell is really painful. (just running simple sed/awk commands isn't comfortable enough)

Somehow a program distribution mechanism desirable.

Basic idea is

  1. Create a wrapper script. This will distribute the entire bashr pipeline definition.
  2. A pipeline definition will contain
    1. aliases or functions which define map/reduce tasks used in the pipeline
    2. how they are connected
    3. (etc, if necessary)
  3. The wrapper script execute map/reduce tasks as defined in the pipeline. In this step the wrapper script and br script will issue ssh command. And they will orchestrate so that the aliases/functions become available before actual execution of the task.
dakusui commented 9 years ago

Since I could come up with a way to make the quoted one liners used in bred consistent (-I option), closing. Program distribution would become necessary sooner or later, but it's a separate issue.