Open JinZhou5042 opened 1 month ago
How many tasks are in the frontier of the graph?
It was 11,759 tasks.
vine_hungry
is the intended solution to this problem! It gives the caller a signal as to when "enough" tasks have been submitted and the manager should get to work, hence this pattern:
while(1) {
while(vine_hungry(m)) {
task = vine_task_create(...);
vine_submit(m, task);
}
task = vine_wait(m,timeout);
}
@RamenMode has recently been working on vine_hungry
. If it doesn't have the desired effect, then bring him into the conversation.
When handling a graph of tasks in DaskVine, it first submits all available tasks in the topmost level of the graph (because they don't depend on the output files produced by any other tasks), and then begins to call
wait
where worker connection and task dispatching happen.However, if the graph is wide enough, thousands of tasks may be ready for submission, then the manager will be busy with submitting tasks instead of dispatching at the initialization stage. If we could delay some task submissions and instead do some worker connection and task dispatching, it might improve the concurrency at the beginning
For example, in the following run, at the first ~10 min, no workers were connected and no tasks were dispatched, which is potentially harmful to the overall execution time.