go-distributed / meritop

We moved to https://github.com/taskgraph/taskgraph
Apache License 2.0
11 stars 3 forks source link

fix epoch workflow to run regression framework #67

Closed hongchaodeng closed 9 years ago

hongchaodeng commented 9 years ago

@xiang90 @xiaoyunwu

This patch will fix the stuck situation of regression framework test:

  1. The previous epoch notification way is broken. Each iteration should call SetEpoch on task to let it decide what work to do.
  2. A few discrepancy fix in regression test for current framework impl.

Limitation: Currently both meta and epoch will be watched from index 1. It means replay the entire history. Version support to resume work should be done later.

xiaoyunwu commented 9 years ago

I pulled down you code, and try to fix the issue. I notice there are still problems. children := t.framework.GetTopology().GetChildren(t.epoch) 199: if len(children) != 0 { t.framework.FlagChildMetaReady("ParamReady") } else {

the logic is wrong, it should be len(children) == 0 means if there is no children for this node, it should flag as child it is ready. but If we do this change, it start to hang again. I suspect before this, it dummySlave is never called really.

hongchaodeng commented 9 years ago

@xiaoyunwu Could you clarify the expected workflow and what functions in dummySlave should be called?

xiaoyunwu commented 9 years ago

Hi, I wrote up the event flow of a path in a tree setup in the wiki:

https://github.com/go-distributed/meritop/wiki/Event-Flow-of-a-Spark-setup

Let me know if you still have questions.

hongchaodeng commented 9 years ago

Just an update on this: The logic is correct, but previously dummyData was trying to do json marshal but didn't expose public fields.

Fixed now.

hongchaodeng commented 9 years ago

Note that this PR will only fix the workflow and logical problem. It didn't implement complete set of regression test. Xiaoyun will be responsible to take it over and finish table driven testing.

xiaoyunwu commented 9 years ago

lgtm

hongchaodeng commented 9 years ago

@xiang90

I will merge this and continue the work on more complex regression test :)