Open macmanes opened 2 months ago
I do see that there is a toil flag --maxJobs
that might help, but not sure how to pass arguments to toil (except for slurm)
TOIL_ARGS="--maxJobs=5000"
??
--maxJobs
sounds about right. It's an option for any cactus command. Ex cactus --help
--maxJobs MAX_JOBS Specifies the maximum number of jobs to submit to the
backing scheduler at once. Not supported on Mesos or
AWS Batch. Use 0 for unlimited. Defaults to unlimited.
It looks like Toil needs to add OSError (with errno 7) to the exception list here that makes us fall back from sacct
to scontrol
, where we don't list all jobs in the command. We also probably need some machinery to limit the maximum jobs asked about at a time (or maybe just the maximum command line length directly).
But as a workaround, limiting the max jobs in flight ought to work.
Thanks @glennhickey and @adamnovak. I can confirm that --maxJobs
does work.
For the TOIL developers, it would be great to be able to change maxJobs after submission like you can using slurm for array jobs scontrol update ArrayTaskThrottle=20 JobId=12345
. This would allow a user to expand and contract given available cluster resources.
Thanks in advance for the support. I am trying to align approx. 90 mammal genomes on a slurm-enabled cluster. Running like this:
I am getting an error in the
run_lastz
phase of the workflow. I believe this is because too many jobs are issued. For this dataset, it was about 46k jobs issued at the time of failure.Any way I can throttle this for instance, to permit 5k or 10k jobs to be issued at a time?