cBio / cbio-cluster

MSKCC cBio cluster documentation
12 stars 2 forks source link

Request for Comment on setting a batch queue maximum wallclock and a new "long" queue #407

Open tatarsky opened 8 years ago

tatarsky commented 8 years ago

We are attempting to allow the use by the batch queue of a set of nodes purchased by a specific group when idle. But some concerns have been expressed about the walltime limit on the batch queue being currently unlimited and the possibility jobs could be scheduled there of long duration interfering with their needs.

I've seen discussed off and on over the years the setting of a walltime limit on batch but wanted to ask per discussions with @juanperin for comments here.

We have a few options which I'll try to explain and if folks have opinions please comment so we can make sure we understand all possible impacts. Our goal is maximizing node usage. We will make no changes without discussion.

Modification to batch queue maximum walltime

  1. We would set a wallclock limit on batch as a way of preventing these additional nodes from being tied up for long duration. For example four days (96 hours) could become the new batch maximum walltime.
  2. I do not believe I can or would want to set a "per node" walltime limit. That I feel would be very confusing. I believe walltimes should be a queue level configuration and all nodes in a queue should support the same limits.

    Creation of a queue for longer running jobs

  3. Longer jobs would be allowed via a queue that has a limited per node slot count. (Perhaps 12). And its ability to run on nodes would be controlled by a different Torque attribute than batch allowing us to adjust its pool of nodes based on monitored demand.
  4. The longer queue could have a walltime limit representing some rational job length value decided upon here. I will state "42 days" just based on some of the 1000 hour jobs I've seen lately. But I defer to comments on the job mix people see as useful.

Thank you for any opinions you might have.

jchodera commented 8 years ago

I wonder if there is a simpler way. Is there a way to restrict jobs that run on the nodes from this mysterious "specific group" to only those with a walltime limit set to, say, <24h?

That way, only jobs that are guaranteed to complete in a reasonable amount of time are ever run on the "specific group" hardware, and no special queues are needed---it is transparent to the submitter. The "specific group" can have a special queue with no such limits.

akahles commented 8 years ago

I think having a separate queue for long running jobs is a good idea. One thing that I have seen implemented in other systems I am using, is an automatic queue assignment based on walltime. So to the end user everything would more or less stay the same. Default queue stays batch but a job exceeding 96h walltime requirement will be scheduled to a different queue, e.g., batch_long, automatically. A subset of nodes could be excluded from this queue, which essentially means that if I have a very long job I might need to wait a little longer until it gets scheduled. Which I personally find ok.

tatarsky commented 8 years ago

The group isn't mysterious I just don't expose names. Its the two groups that purchased nodes for their queues that were added over the shutdown. If you pop into Slack I can elaborate.

Your item to only run jobs with a lower walltime will be added to the possible options. To be honest I do not know if I can do that but will look at Moab config options.

jchodera commented 8 years ago

@akahles's idea of "automatic queue assignment based on walltime" may be another way to implement my suggestion.

tatarsky commented 8 years ago

@akahles I will look into if that is possible so people do not have to select the longer queue. That would be I believe handled by Torque as it does the initial scheduling.

tatarsky commented 8 years ago

It may also be a submit filter.

callosciurus commented 8 years ago

+1 for limiting walltime of jobs run on group specific nodes. Segmentation into more queues ultimately penalizes certain types of jobs and also creates inefficiencies as @akahles has pointed out. That would be ok if walltime was always very predictable, but in many instances it is not.

tatarsky commented 8 years ago

Note for myself the concept of a routing queue is Torque supported. I am trying to determine if walltime is a supported routable attribute however.

lzamparo commented 8 years ago

+1 to @akahles suggestion for different queues based on wallclock submission time in the script.

I am also in favour of a max wallclock time on the batch queue (say 48 or 96 hrs), in order to decrease latency in the queue, even if this means that we may all have to increase the use of checkpoints for our longer running jobs.