cooperative-computing-lab / cctools

The Cooperative Computing Tools (cctools) enable large scale distributed computations to harness hundreds to thousands of machines from clusters, clouds, and grids.
http://ccl.cse.nd.edu
Other
134 stars 120 forks source link

vine: long initialization time for large task graphs in DaskVine #3957

Open JinZhou5042 opened 1 month ago

JinZhou5042 commented 1 month ago

When handling a graph of tasks in DaskVine, it first submits all available tasks in the topmost level of the graph (because they don't depend on the output files produced by any other tasks), and then begins to call wait where worker connection and task dispatching happen.

However, if the graph is wide enough, thousands of tasks may be ready for submission, then the manager will be busy with submitting tasks instead of dispatching at the initialization stage. If we could delay some task submissions and instead do some worker connection and task dispatching, it might improve the concurrency at the beginning

For example, in the following run, at the first ~10 min, no workers were connected and no tasks were dispatched, which is potentially harmful to the overall execution time.

image
BarrySlyDelgado commented 1 month ago

How many tasks are in the frontier of the graph?

JinZhou5042 commented 1 month ago

It was 11,759 tasks.

dthain commented 1 month ago

vine_hungry is the intended solution to this problem! It gives the caller a signal as to when "enough" tasks have been submitted and the manager should get to work, hence this pattern:

while(1) {
    while(vine_hungry(m)) {
        task = vine_task_create(...);
        vine_submit(m, task);
    }
    task = vine_wait(m,timeout);
}

@RamenMode has recently been working on vine_hungry. If it doesn't have the desired effect, then bring him into the conversation.