Torque qsub equivalent to SGE qsub -sync or LSF qsub -K

gbeane commented 10 years ago

Both SGE and LSF have qsub options that cause the qsub command to wait until the job has completed before returning. Some 3rd party pipeline applications make use of this feature, which makes porting them to Torque more difficult (it necessitates the use of a wrapper script that does the qsub and then polls the job using qstat in a loop until the job finishes).

adeslatt commented 8 years ago

Is it not possible using the qsub -Wdepends feature on PBS/Torque? qsub -Wdepend=afterok:$newjobid testing.sh

knielson commented 8 years ago

That should work. Make sure keep_competed is set in qmgr. Also, $newjobid must already be queued. On Apr 7, 2016 3:41 AM, "adeslatt" notifications@github.com wrote:

Is it not possible using the qsub -Wdepends feature on PBS/Torque? qsub -Wdepend=afterok:$newjobid testing.sh

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/adaptivecomputing/torque/issues/268#issuecomment-206787065

gbeane commented 8 years ago

no, using job dependencies would not work for this. I'm talking about having the qsub command block until the job you are submitting completes, so instead of having qsub return immediately with a job ID it would block until the job runs. The return value of qsub would be the return value of the job. Both SGE and LSF have this feature, and some workflow tools make use of it. Switching to dependencies would require altering the workflow tool. (my own workflow tool uses dependencies, but some rely on this blocking submission behavior.)

knielson commented 8 years ago

Glen,

So make a qsub that queues and runs the job in one step as far as the user is concerned. Right?

Ken

On Thu, Apr 28, 2016 at 8:44 AM, Glen Beane notifications@github.com wrote:

no, using job dependencies would not work for this. I'm talking about having the qsub command block until the job you are submitting completes, so instead of having qsub return immediately with a job ID it would block until the job runs. The return value of qsub would be the return value of the job. Both SGE and LSF have this feature, and some workflow tools make use of it. Switching to dependencies would require altering the workflow tool. (my own workflow tool uses dependencies, but some rely on this blocking submission behavior.)

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/adaptivecomputing/torque/issues/268#issuecomment-215448850

[image: Adaptive Computing] http://www.adaptivecomputing.com [image: Twitter] http://twitter.com/AdaptiveMoab [image: LinkedIn] http://www.linkedin.com/company/448673?goback=.fcs_GLHD_adaptive+computing_false_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2&trk=ncsrch_hits [image: YouTube] http://www.youtube.com/adaptivecomputing [image: GooglePlus] https://plus.google.com/u/0/102155039310685515037/posts [image: Facebook] http://www.facebook.com/pages/Adaptive-Computing/314449798572695?fref=ts [image: RSS] http://www.adaptivecomputing.com/feed Ken Nielson Sr. Software Engineer +1 801.717.3700 office +1 801.717.3738 fax 1712 S. East Bay Blvd, Suite 300 Provo, UT 84606 www.adaptivecomputing.com

gbeane commented 8 years ago

Yes. qsub --sync (SGE) or bsub -K (LSF) will block until the job finishes, and their return value will be the same as the job.

Here is the LSF documentation for bsub -K:

-K

Submits a job and waits for the job to complete. Sends the message "Waiting for dispatch" to the terminal when you submit the job. Sends the message "Job is finished" to the terminal when the job is done. If LSB_SUBK_SHOW_EXEC_HOST is enabled in lsf.conf, also sends the message "Starting on execution_host" when the job starts running on the execution host.

You are not able to submit another job until the job is completed. This is useful when completion of the job is required to proceed, such as a job script. If the job needs to be rerun due to transient failures, bsub returns after the job finishes successfully. bsub exits with the same exit code as the job so that job scripts can take appropriate actions based on the exit codes. bsub exits with value 126 if the job was terminated while pending.

You cannot use the -K option with the -I, -Ip, or -Is options.

knielson commented 8 years ago

We could make this work pretty easily. I will bring it up with the Torque team.

knielson commented 8 years ago

Using this functionality with Torque would mean you would bypass any scheduling benefits. It would run on a first fit basis.

pipitone commented 8 years ago

Just to point out that currently you can get this to work with PBS using this hack:

qsub -W depend=after:$jobid -I -x true

gbeane commented 8 years ago

Using this functionality with Torque would mean you would bypass any scheduling benefits. It would run on a first fit basis.

No Ken,

The job should be scheduled the same as any other job. That is, Moab (if you use it) would apply the same prioritization/policies to this job as if it had been submitted without this qsub option. Please read the SGE qsub --sync and LSF bsub -K documentation to see what behavior we are suggesting. If it shortcircuited scheduling and just took the next available slot then that would not be the correct implementation.

The only difference is the qsub command blocks waiting for the job to be scheduled and run. Then once the job is finished, qsub will exit with the return value of the job (so if the job script exited with a non-zero value, then qsub will return that same value). It could print a message, similar to qsub -I, saying that "job XXXX waiting to start".

knielson commented 8 years ago

Glen,

Thanks for the clarification. That requires more work. Still doable but we won't be able to do it like a Moab backfill job.

Ken

On Wed, Jun 8, 2016 at 7:50 PM, Glen Beane notifications@github.com wrote:

Using this functionality with Torque would mean you would bypass any scheduling benefits. It would run on a first fit basis.

No Ken,

The job should be scheduled the same as any other job. That is, Moab (if you use it) would apply the same prioritization/policies to this job as if it had been submitted without this qsub option. Please read the SGE qsub --sync and LSF bsub -K documentation to see what behavior we are suggesting. If it shortcircuited scheduling and just took the next available slot then that would not be the correct implementation.

The only difference is the qsub command blocks waiting for the job to be scheduled and run. Then once the job is finished, qsub will exit with the return value of the job (so if the job script exited with a non-zero value, then qsub will return that same value). It could print a message, similar to qsub -I, saying that "job XXXX waiting to start".

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/adaptivecomputing/torque/issues/268#issuecomment-224781267, or mute the thread https://github.com/notifications/unsubscribe/ACCEHIJhSuuVp0vuy93bEJ0N0gOlBQb8ks5qJ3FugaJpZM4CqAa7 .

[image: Adaptive Computing] http://www.adaptivecomputing.com [image: Twitter] http://twitter.com/AdaptiveMoab [image: LinkedIn] http://www.linkedin.com/company/448673?goback=.fcs_GLHD_adaptive+computing_false_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2&trk=ncsrch_hits [image: YouTube] http://www.youtube.com/adaptivecomputing [image: GooglePlus] https://plus.google.com/u/0/102155039310685515037/posts [image: Facebook] http://www.facebook.com/pages/Adaptive-Computing/314449798572695?fref=ts [image: RSS] http://www.adaptivecomputing.com/feed Ken Nielson Sr. Software Engineer +1 801.717.3700 office +1 801.717.3738 fax 1712 S. East Bay Blvd, Suite 300 Provo, UT 84606 www.adaptivecomputing.com

gdevenyi commented 6 years ago

Wondering how adding this feature is going? My tool for abstracting different cluster systems has a blocking submission feature stuck on this:https://github.com/pipitone/qbatch/issues/103

adaptivecomputing / torque

Torque qsub equivalent to SGE qsub -sync or LSF qsub -K #268