geodesymiami / rsmas_insar

RSMAS InSAR code
https://rsmas-insar.readthedocs.io/
GNU General Public License v3.0
59 stars 23 forks source link

Support Multiple Queues #466

Closed Ovec8hkin closed 3 years ago

Ovec8hkin commented 3 years ago

Stampede has skx-normal and normal It should use the MAX_JOBS_PER_QUEUE from each QUEUE (from defaults/queues.cfg or use command qlimits) 25 and 50 , respectively. This will also allow to submit to skx-dev queue when the skx-normal job limit is reached. For tasks the common task limits apply (e.g. 1000 tasks limit over all queues).

Ovec8hkin commented 3 years ago

@falkamelung What functionality do you explicitly want here? Do you want to be able to pass the queue to submit jobs too to submit_jobs.bash, and let it figure out the MAX_JOBS_PER_QUEUE value to use?

falkamelung commented 3 years ago

Yes exactly. Task limits over all queues. Do you have any questions on this? The ability to submit to skx-dev is particularly useful.

falkamelung commented 3 years ago

To answer to the second comment. For now the queue is given in the jobfile. I think it will be useful to have the ability to submit to a different queue.

We could have a submit_jobs.bash option --queue_priority skx-dev,development,skx_normal,normal. If the job limit of skx-dev is not reached it would submit to this. This will be particularly useful on Frontera given the type of queues they have (small and flex) but I am not there yet. Something to keep in mind.

Ovec8hkin commented 3 years ago

For now the queue is given in the jobfile.

If the queue is given in the jobfile you cant artificially submit to a different queue I don't think.

falkamelung commented 3 years ago

Before sbatch run_09_ifgram.job we could replace the queuename in the *.job file with the queue we want to submit to This would be pretty straightforward.

Ovec8hkin commented 3 years ago

Regardless, this is the functionality required as I currently understand it: use the value of the $QUEUE environment variable to lookup the maximum number of allowable jobs from limits, and use this to limit job, rather than specifying it manually.

Any additional work on switching what queues a job is submitted too at runtime is something else entirely.