Closed vinjana closed 7 years ago
That should actually be catched and the jobs should be aborted.. Why does that not happen?
Am 27. November 2017 1:52:49 nachm. schrieb Philip Reiner Kensche notifications@github.com:
If more jobs are submitted than allowed for the user, Roddy dies with a stack trace:
A workflow error occurred, try to rollback / abort submitted jobs. bkill 14843 14844 14848 14849 14850 14852 14856 An unknown / unhandled exception occurred: 'Could not parse raw ID from: 'Group <resUsers>: Pending job threshold reached. Retrying in 60 seconds...'' de.dkfz.roddy.execution.jobs.cluster.lsf.LSFJobManager.parseJobID(LSFJobManager.groovy:611) de.dkfz.roddy.execution.jobs.BatchEuphoriaJobManager.extractAndSetJobResultFromExecutionResult(BatchEuphoriaJobManager.groovy:143) de.dkfz.roddy.execution.jobs.cluster.lsf.LSFJobManager.runJob(LSFJobManager.groovy:148) de.dkfz.roddy.execution.jobs.cluster.lsf.LSFJobManager$runJob.call(Unknown Source) de.dkfz.roddy.execution.jobs.Job.run(Job.groovy:534) de.dkfz.roddy.knowledge.methods.GenericMethod.createAndRunSingleJob(GenericMethod.groovy:507) de.dkfz.roddy.knowledge.methods.GenericMethod._callGenericToolOrToolArray(GenericMethod.groovy:255) de.dkfz.roddy.knowledge.methods.GenericMethod.callGenericTool(GenericMethod.groovy:50) de.dkfz.b080.co.files.CoverageTextFile.plot(CoverageTextFile.java:49) de.dkfz.b080.co.files.CoverageTextFileGroup.plot(CoverageTextFileGroup.java:47) de.dkfz.b080.co.qcworkflow.QCPipeline.execute(QCPipeline.groovy:90) de.dkfz.roddy.core.ExecutionContext.execute(ExecutionContext.groovy:625) de.dkfz.roddy.core.Analysis.executeRun(Analysis.java:397) de.dkfz.roddy.core.Analysis.executeRun(Analysis.java:341) de.dkfz.roddy.core.Analysis.rerun(Analysis.java:229) de.dkfz.roddy.client.cliclient.RoddyCLIClient.rerun(RoddyCLIClient.groovy:513) de.dkfz.roddy.client.cliclient.RoddyCLIClient.parseStartupMode(RoddyCLIClient.groovy:116) de.dkfz.roddy.Roddy.parseRoddyStartupModeAndRun(Roddy.java:721) de.dkfz.roddy.Roddy.startup(Roddy.java:289) de.dkfz.roddy.Roddy.main(Roddy.java:216)
The reason here is an interaction of Roddy and BE. It seems BE searches for a specific line in the output that is not found. Instead returns the wait-notice above. When s.th. like this happens on the command line upon manual submission, bsub blocks.
-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/eilslabs/Roddy/issues/161
The job is aborted (bkill). The first problem is not that the handling of the job-failure is not o.k. (although this also is the problem), but that the error actually occurs. Roddy needs to be able to handle full submission queues -- either through blocking and waiting or through exit and rollback (like now, but w/o stacktrace), or by exit w/o rollback.
Is it really a Roddy problem? In the end yes, but at first, the error should be catched by the LSF job manager right? The error is there.
I moved it to BE and close it here. BE will need to throw a proper exception, Roddy needs to catch that then.
If more jobs are submitted than allowed for the user, Roddy dies with a stack trace:
The reason here is an interaction of Roddy and BE. It seems BE searches for a specific line in the output that is not found. Instead returns the wait-notice above. When s.th. like this happens on the command line upon manual submission, bsub blocks.