broadinstitute / cromwell

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
http://cromwell.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
995 stars 360 forks source link

Differences of job scheduling between using cromwell+PBS backend and using PBS alone #6339

Open jiangyue0011 opened 3 years ago

jiangyue0011 commented 3 years ago

Hi everyone,

When I use cromwell + PBS backend I found the way that jobs are scheduled has some differences from using PBS alone.

For example, I have a pipeline of 2 steps, A-B, where B depends on A. Now I want to submit this pipeline for 2 times which will generate 4 jobs A1 B1 A2 B2. Let's assume the cluster only have resource to run one of the jobs at a time.

When I use PBS alone all of the 4 jobs will be in the queue, at the beginning A1 gets to run and the others waiting. When A1 is done B1, A2 both have a chance to run depending on the priority PBS assigns to them. So the order of the four jobs might be A1-B1-A2-B2 or A1-A2-B1-B2.

When I use cromwell + PBS backend cromwell will first send A1 and A2 to the queue without B1 and B2 since they won't be ready to run until A1 and A2 are done. When A1 is done A2 gets to run because it's the only job in the queue while B1 is on its way to the queue. So in this case the order of these jobs can only be A1-A2-B1-B2.

This is not big issue when there are only a few pipelines to run. However, when I have, say 100 such pipelines, B1 has to wait until A100 to finish since when A1 finishes A2-A99 are already in the queue waiting and B1 has just set off. This means the finishing time of pipeline1(A1-B1) will be affected a lot by the total number of pipelines submitted to cromwell engine.

Is there any way for cromwell to change this situation (like sending all jobs to the backend without blocking any of them)? I really don't want to wait until all "A"s to finish to get the first result of submitted pipelines.

Hope I have made this problem clear. I have read the documents of cromwell and googled quit a bit but didn't find any solution.

Any help would be appreciated!

Yue

clhappyjiang commented 6 months ago

Hi! I am glad to see this issue, and I have also tried using PBS as the backend to run it. But I'm not very good at it. Can you show me how the configuration file for cromwell is defined when using PBS as the backend? Thank you.