Closed amiraa127 closed 8 years ago
This seems pretty specific to SGE or a similar scheduler -- I imagine you're interested in saving on SGE transaction cost per job or queuing time? Supporting this would get pretty tricky given pyflow's current design with limited benefit compared to the "script C" solution. Maybe this case is motivated by another consideration?
Hi Chris,
As you mentioned, this is motivated by saving on SGE transaction cost per job. Let's say for example that tasks P and Q are both dependent on A. If I bundle A and P into C (or A,P,Q into C), Q will start after P which is not optimal. In this case, the "script C" solution cannot take advantage of the parallelism. I understand the difficulties involved in adding this feature. I just wanted to make sure there isn't any quick fix that I'm not aware of.
Thank you for the quick response
Ok. I'll take PRs on these sorts of issues but there will probably be less attention given to SGE/drmaa going forward. We're already seeing the many-core transition leading to most pyflow use on a single node compared to a few years ago when it was originally developed.
Bit late here, but you could have Task A write a status file (eg, .task_A_complete) and have watcher job that periodically checks for the file before launching task C.
Hi
If we have two tasks A and B, where B depends on A, the scheduler starts B after finishing A. However, in doing this, the scheduler exits the compute node that A was running on and requests a new node for B. I was wondering if there is a way such that B starts on the same node that A was on without the need to reenter the queue.
One solution for this would be two bundle A and B into a single script C and add C to the task list. However, this way, we do not have independent knowledge of the completion of tasks A. I was wondering if there is a way to do this without bundling.