Closed gdevenyi closed 8 years ago
SGE does it by either matching the job name with a pattern or the job ID explicitly. I like that approach.
From the SGE man pages:
wc_job
The wildcard job specification is a placeholder for job ids,
job names including job name patterns. A job id always
references one job, while the name and pattern might refer-
ence multiple jobs.
wc_job := job-id | job-name | pattern
So, SGE already supports it it, pbs needs work though :)
Sidethought, should we rename afterok -> depend?
I found afterany to be the most robust for PBS. It means that a subsequent job can run even if the previous one failed and wasn't set to rerun. At least then users can determine what went wrong from their own logfiles rather than having to figure out how to query the scheduler to find out why a job isn't running.
At this point, they either remove the job or do something magical. In the former case just running the job achieves the same purpose. In the latter we have a smart user who knows stuff and possibly doesn't need to use qbatch!
https://github.com/andrewjanke/qbatch/blob/master/qbatch#L139
@andrewjanke the reason for afterok instead of afterany is that if I have a pipeline built around qbatch which has a dependency chain, I can't run the next stage without the prior stage finishing successfully. If I allow the pipeline to continue I have to debug a failure of the commands downstream, rather than at the true failure point.
@gdevenyi I mustn't have said it right. The situation you describe with dependencies is exactly why I used afterany...
I prefer to have subsequent steps fail but as part of doing so write things to logfiles. This then means that irrespective of the scheduler I'm using (PBS or gridengine) I can use the same heuristics/checker to tell me where things broke. In my case I have a number of things that parse logfiles.
So, if I use afterany I do this after an error:
If I use afterok I do this to sort an error
I prefer the former, you may have automated the latter.
@andrewjanke See #112 for implementation of PBS/SGE job number dependencies. It simply extends the XML tree search in PBS to check job numbers as well, and adds them to the depends list if found. The existing SGE implementation already works since it allows for names or IDs using the same mechanism.
Merged
Suggested by Andrew Jankie
We should allow specification of exact job numbers for dependencies in addition to name glob matching.