cooperative-computing-lab / makeflow-examples

Example workflows for the Makeflow workflow system.
32 stars 18 forks source link

just a small idea: balance between light and heavy jobs #37

Closed stemangiola closed 4 years ago

stemangiola commented 4 years ago

I noticed that makeflow (maybe depending on command order or randomness) queues (slurm) light jobs before than heavy jobs (with the same dependencies). Many queuing systems prioritize light jobs over early (at same level of availability), so mixing light with heavy jobs could benefit the overall workflow completion, as heavy jobs will have more time in being queued, while light jobs may keep being executed and replaced.

All this is without deep knowledge about whether this issue is real (or I'm tricking myself) and generally applicable to all scenarios of yours.

Feel free to close this issue.

stemangiola commented 4 years ago

I see now, makeflow start reading from the bottom up (sort of speak). So if I want to queue first something (at equal dependencies) I have to print it lower in the makefile

dthain commented 4 years ago

Yes, currently the order in the makefile (somewhat by accident) affects the order in which equal priority jobs are submitted. (Although I could imagine others might want to prioritize on other properties.)

btovar commented 4 years ago

@stemangiola, makeflow sends jobs to slurm as their dependencies become available up to --max-remote jobs. The default is 100. The order they are delivered is more an artifact of the implementation, and should not be relied upon. One thing you could try to do is to increase --max-remote to include more of your workflow. In this way, jobs will spend more time in the slurm queue, which then it can apply its policies, etc.

makeflow does not do any scheduling of jobs by itself. It relies on the underlying batch system to do the scheduling. We have the default 100 in --remote-max-jobs as a safety net not to overwhelm the underlying batch system. However, if you already know that your workflows are sound and are in the stage of optimizing them, increasing --remote-max-jobs should be safe option.

stemangiola commented 4 years ago

Makes sense. Thanks