ClandininLab / brainsss2

refactor of the brainsss repo for 2-photon imaging analysis
MIT License
0 stars 0 forks source link

consider queueing to multiple partitions where possible #36

Open rueberger opened 2 years ago

rueberger commented 2 years ago

cc: brainsss1

I've noticed that jobs queue either to trc or normal. You might consider queueing both to normal and trc where possible (ie for time limits <= 2 days) and to owners for short running or checkpointable jobs (although requeuing may require some modification to the control flow logic in preprocess.py).

In general this would just be a convenience to shorten queue times and load-balance, aside from one scenario: submissions to normal are limited when the global number of cpus in use for a group exceeds 512. group partitions and owners are unaffected by cpu limits.

For instance, in this scenario yandan's moco jobs won't execute until a number of other jobs finish, but could execute immediately on trc.

Screen Shot 2022-08-30 at 4 59 18 PM

Weird policy, but according to Killian "Owner groups are expected to mainly submit jobs to their own partition, as well as to the owners partition that offers them a very large pool of resources for free."

poldrack commented 2 years ago

this does sound like a good idea, though it will make testing rather more complex...

On Tue, Aug 30, 2022 at 7:53 PM Andrew Berger @.***> wrote:

cc: brainsss1

I've noticed that jobs queue either to trc or normal. You might consider queueing both to normal and trc where possible (ie for time limits <= 2 days) and to owners for short running or checkpointable jobs (although requeuing may require some modification to the control flow logic in preprocess.py).

In general this would just be a convenience to shorten queue times and load-balance, aside from one scenario: submissions to normal are limited when the global number of cpus in use for a group exceeds 512. group partitions and owners are unaffected by cpu limits.

For instance, in this scenario yandan's moco jobs won't execute until a number of other jobs finish, but could execute immediately on trc.

[image: Screen Shot 2022-08-30 at 4 59 18 PM] https://user-images.githubusercontent.com/8816362/187581030-a4392e52-5f04-4e5d-a721-382da5211391.png

Weird policy, but according to Killian "Owner groups are expected to mainly submit jobs to their own partition, as well as to the owners partition that offers them a very large pool of resources for free."

— Reply to this email directly, view it on GitHub https://github.com/ClandininLab/brainsss2/issues/36, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGUVEGUFBA4ACHKQTBBO23V33CJ5ANCNFSM6AAAAAAQA6536M . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Russell A. Poldrack Albert Ray Lang Professor of Psychology Associate Director, Stanford Data Science Director, SDS Center for Open and Reproducible Science Building 420 Stanford University Stanford, CA 94305

@. @.> http://www.poldracklab.org/