datasayer / meerkat

Apache License 2.0
9 stars 0 forks source link

Support Storm-like (role-based) task grouping and chainable bolts. #17

Open edwardyoon opened 10 years ago

edwardyoon commented 10 years ago

Yesterday, I had survey the Storm. Storm's task grouping and chainable bolts are pretty nice (especially, chainable bolts looks really useful for real-time join operation).

We can also implement similar functions of Storm's DAG style task scheduling. My rough idea is:

  1. Launches multi-tasks as number of Bolts per node. For example:
+---------------+
|    Server1    |
+---------------+
Task-1. tailing meer
Task-2. split sentence meer
Task-3. wordcount meer

BossMeer: master aggregation
  1. Assign the meerkats to proper group.
  2. Each task calls their user-defined function and sends messages to task of next group.
  3. Synchronizes all.

Then, we can do above all in one superstep. The challenging is management of tasks.

ijsong commented 10 years ago

Did you mean that the "group" is a set of tasks with same functions (=same user-defined functions)?

In my understanding, I can draw this flow; https://docs.google.com/drawings/d/1KpNKIznXGfMYH3P-yq5yDp4_SnVOotDlxPI8w-kHTh8/edit?usp=sharing

Can we manage group and its topology by using Zookeeper?

On Fri, Apr 11, 2014 at 1:57 PM, Edward J. Yoon notifications@github.comwrote:

Yesterday, I had survey the Storm. Storm's task grouping and chainable bolts are pretty nice (especially, chainable bolts looks really useful in case of real-time join operation).

We can also implement similar functions of Storm's task grouping and chainable bolts. My rough idea is:

  1. Launches multi-tasks as number of Bolts per node. For example:

+---------------+ | Server1 | +---------------+ Task-1. tailing bolt Task-2. split sentence bolt Task-3. wordcount Task-4. master aggregation

  1. Assign the tasks to proper group.
  2. Each task calls their user-defined function and sends messages to task of next group.
  3. Synchronizes all.

Then, we can do above all in single superstep. The challenging is management of tasks.

— Reply to this email directly or view it on GitHubhttps://github.com/garudakang/meerkat/issues/17 .

edwardyoon commented 10 years ago

@ijsong Yes and your diagram is right.

To manage the Topolgy, my idea is like below:

  1. At first superstep, one task creates a Topology map based on job configuration.
  2. And broadcasts it to all other tasks.

The only problem is that Hama's Simple-Scheduler assigns the task randomly.

ijsong commented 10 years ago

I read issue of rotation scheduler (https://issues.apache.org/jira/browse/HAMA-900). Is it related to the problem you stated above?

edwardyoon commented 10 years ago

@ijsong Yes. Do you have a interested in?

ijsong commented 10 years ago

@edwardyoon Yes, I am also interested in HAMA. I will read sources of HAMA, especially part of task allocation and scheduler.

edwardyoon commented 10 years ago

@ijsong SimpleScheduler consumes all of the available task slots:

      // assembly into actions
      for (Task task : taskSet) {
        GroomServerStatus groomStatus = jip.getGroomStatusForTask(task);
        List<GroomServerAction> taskActions = actionMap.get(groomStatus);
        if (taskActions == null) {
          taskActions = new ArrayList<GroomServerAction>(
              groomStatus.getMaxTasks());
        }
        taskActions.add(new LaunchTaskAction(task));
        actionMap.put(groomStatus, taskActions);
      }

      sendDirectivesToGrooms(actionMap);

I guess we need to change only this part.

You can implement HAMA-900 by extending TaskScheduler. If you want, Please comment on HAMA-900, so that I can assign it to you.

edwardyoon commented 10 years ago

NOTE: we'll need to redesign the GuardMeer, and toplogy configuration interface.

edwardyoon commented 10 years ago

Hi, @ijsong As I sent a mail, I'm thinking about move this to ASF. Please feel free to comment your preference on that [DISCUSS] mail thread.

edwardyoon commented 10 years ago

Let's keep develop in here github, until we reach meerkat 0.1 version.

ijsong commented 10 years ago

Ok, sorry for my laziness. I suspended my work for this project and hama because I was busy a little. I am going to investigate how to test and develop hama efficiently and how to extend simple scheduler in this week.

edwardyoon commented 10 years ago

@ijsong No problem. You don't need to hurry.

NOTE: BTW, Hama BSP job doesn't support multiple message types. So, to specify different message type between task groups, we'll need to use MapWritable internally or create own MeerkatMessageWritable like http://svn.apache.org/repos/asf/hama/trunk/graph/src/main/java/org/apache/hama/graph/GraphJobMessage.java