galaxyproject / pulsar

Distributed job execution application built for Galaxy
https://pulsar.readthedocs.io
Apache License 2.0
37 stars 50 forks source link

Queue/limits for pre- and post-processing #349

Open natefoo opened 10 months ago

natefoo commented 10 months ago

Currently, Pulsar will take as many jobs off the setup queue as are received and immediately begin preprocessing them. If there is a large backlog of jobs (e.g. due to some kind of prior problem resulting in jobs not processing for a time period), this results in a large amount of IO contention staging in (and possibly hitting open file limits, if you don't increase them), causing jobs with even moderately small inputs to queue for hours because writing is so slow.

Unfortunately I can't really quantify the penalty - it is possible that the overall job throughput would not be any better even if a limited queue were in place, since the same amount of data still has to be transferred either way. But I do suspect it'd still move that data quicker if it weren't trying to do all of it at once.