Open rueberger opened 2 years ago
this does sound like a good idea, though it will make testing rather more complex...
On Tue, Aug 30, 2022 at 7:53 PM Andrew Berger @.***> wrote:
cc: brainsss1
I've noticed that jobs queue either to trc or normal. You might consider queueing both to normal and trc where possible (ie for time limits <= 2 days) and to owners for short running or checkpointable jobs (although requeuing may require some modification to the control flow logic in preprocess.py).
In general this would just be a convenience to shorten queue times and load-balance, aside from one scenario: submissions to normal are limited when the global number of cpus in use for a group exceeds 512. group partitions and owners are unaffected by cpu limits.
For instance, in this scenario yandan's moco jobs won't execute until a number of other jobs finish, but could execute immediately on trc.
[image: Screen Shot 2022-08-30 at 4 59 18 PM] https://user-images.githubusercontent.com/8816362/187581030-a4392e52-5f04-4e5d-a721-382da5211391.png
Weird policy, but according to Killian "Owner groups are expected to mainly submit jobs to their own partition, as well as to the owners partition that offers them a very large pool of resources for free."
— Reply to this email directly, view it on GitHub https://github.com/ClandininLab/brainsss2/issues/36, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGUVEGUFBA4ACHKQTBBO23V33CJ5ANCNFSM6AAAAAAQA6536M . You are receiving this because you are subscribed to this thread.Message ID: @.***>
-- Russell A. Poldrack Albert Ray Lang Professor of Psychology Associate Director, Stanford Data Science Director, SDS Center for Open and Reproducible Science Building 420 Stanford University Stanford, CA 94305
@. @.> http://www.poldracklab.org/
cc: brainsss1
I've noticed that jobs queue either to trc or normal. You might consider queueing both to normal and trc where possible (ie for time limits <= 2 days) and to owners for short running or checkpointable jobs (although requeuing may require some modification to the control flow logic in preprocess.py).
In general this would just be a convenience to shorten queue times and load-balance, aside from one scenario: submissions to normal are limited when the global number of cpus in use for a group exceeds 512. group partitions and owners are unaffected by cpu limits.
For instance, in this scenario yandan's moco jobs won't execute until a number of other jobs finish, but could execute immediately on trc.
Weird policy, but according to Killian "Owner groups are expected to mainly submit jobs to their own partition, as well as to the owners partition that offers them a very large pool of resources for free."