Closed dankwart-de closed 4 years ago
Maybe the submission actually waited, but only the first line was parsed (which contained the wait output). Changing the parser to parse the last line would then fix this.
The issue will be closed as soon as a test exists (with multi-line input joined by '\n').
With the environment variable LSB_NTRIES
it is possible to set the number of attempts if the LSF server is not reachable or the maximum number of pending jobs is reached.
I would suggest to set this to 1 for all LSF commands.
If more jobs are submitted than allowed for the user, Roddy dies with a stack trace. However, this is a BE problem and BE should throw a proper exception, which could then be catched by clients.
A workflow error occurred, try to rollback / abort submitted jobs. bkill 14843 14844 14848 14849 14850 14852 14856 An unknown / unhandled exception occurred: 'Could not parse raw ID from: 'Group: Pending job threshold reached. Retrying in 60 seconds...''
de.dkfz.roddy.execution.jobs.cluster.lsf.LSFJobManager.parseJobID(LSFJobManager.groovy:611)
de.dkfz.roddy.execution.jobs.BatchEuphoriaJobManager.extractAndSetJobResultFromExecutionResult(BatchEuphoriaJobManager.groovy:143)
de.dkfz.roddy.execution.jobs.cluster.lsf.LSFJobManager.runJob(LSFJobManager.groovy:148)
de.dkfz.roddy.execution.jobs.cluster.lsf.LSFJobManager$runJob.call(Unknown Source)
de.dkfz.roddy.execution.jobs.Job.run(Job.groovy:534)
de.dkfz.roddy.knowledge.methods.GenericMethod.createAndRunSingleJob(GenericMethod.groovy:507)
de.dkfz.roddy.knowledge.methods.GenericMethod._callGenericToolOrToolArray(GenericMethod.groovy:255)
de.dkfz.roddy.knowledge.methods.GenericMethod.callGenericTool(GenericMethod.groovy:50)
de.dkfz.b080.co.files.CoverageTextFile.plot(CoverageTextFile.java:49)
de.dkfz.b080.co.files.CoverageTextFileGroup.plot(CoverageTextFileGroup.java:47)
de.dkfz.b080.co.qcworkflow.QCPipeline.execute(QCPipeline.groovy:90)
de.dkfz.roddy.core.ExecutionContext.execute(ExecutionContext.groovy:625)
de.dkfz.roddy.core.Analysis.executeRun(Analysis.java:397)
de.dkfz.roddy.core.Analysis.executeRun(Analysis.java:341)
de.dkfz.roddy.core.Analysis.rerun(Analysis.java:229)
de.dkfz.roddy.client.cliclient.RoddyCLIClient.rerun(RoddyCLIClient.groovy:513)
de.dkfz.roddy.client.cliclient.RoddyCLIClient.parseStartupMode(RoddyCLIClient.groovy:116)
de.dkfz.roddy.Roddy.parseRoddyStartupModeAndRun(Roddy.java:721)
de.dkfz.roddy.Roddy.startup(Roddy.java:289)
de.dkfz.roddy.Roddy.main(Roddy.java:216)
The reason here is an interaction of Roddy and BE. It seems BE searches for a specific line in the output that is not found. Instead returns the wait-notice above. When s.th. like this happens on the command line upon manual submission, bsub blocks.