TheRoddyWMS / Roddy

The Roddy workflow development and management system.
http://roddy-documentation.readthedocs.io
MIT License
8 stars 3 forks source link

testrerun with bam2fastq broken #248

Closed vinjana closed 6 years ago

vinjana commented 6 years ago

When running the BamToFastqPlugin in testrerun and rerun mode, the read-groups are culled a second time from the BAMs. Whatever the reasons for that (probably new workflow object), in the second round the workflow crashes with

/icgc/dkfzlsdf/analysis/B080/kensche/RoddyProject_Roddy3.0/Roddy/roddy.sh testrerun bam2fastq.any@convert gms --useconfig=/icgc/dkfzlsdf/analysis/B080/kensche/RoddyProject_Roddy3.0/testConfigs/applicationProperties-analysis-local-lsf.ini --useRoddyVersion=develop --con
figurationDirectories=/icgc/dkfzlsdf/analysis/B080/kensche/RoddyProject_Roddy3.0/testConfigs/BamToFastqPlugin/ --usePluginVersion=BamToFastqPlugin:develop --useiodir=/icgc/dkfzlsdf/analysis/B080/kensche/testData/view-by-pid,/icgc/dkfzlsdf/analysis/B080/kensche/roddy-bam2
fastq-test '--cvalues=bamfile_list:/icgc/dkfzlsdf/analysis/B080/kensche/testData/view-by-pid/gms/data//gerald_D1VCPACXX_1.bam;/icgc/dkfzlsdf/analysis/B080/kensche/testData/view-by-pid/gms/data//gerald_D1VCPACXX_2.bam;/icgc/dkfzlsdf/analysis/B080/kensche/testData/view-by-
pid/gms/data//gerald_D1VCPACXX_3.bam;/icgc/dkfzlsdf/analysis/B080/kensche/testData/view-by-pid/gms/data//gerald_D1VCPACXX_4.bam;/icgc/dkfzlsdf/analysis/B080/kensche/testData/view-by-pid/gms/data//gerald_D1VCPACXX_5.bam;/icgc/dkfzlsdf/analysis/B080/kensche/testData/view-b
y-pid/gms/data//gerald_D1VCPACXX_6.bam;/icgc/dkfzlsdf/analysis/B080/kensche/testData/view-by-pid/gms/data//gerald_D1VCPACXX_7.bam;/icgc/dkfzlsdf/analysis/B080/kensche/testData/view-by-pid/gms/data//gerald_D1VCPACXX_8.bam'
Required JRE/JDK: 1.8
Required Groovy:
Runtime Environment:
  /home/kensche/.sdkman/candidates/java/8.0.171-oracle/bin/java
  /home/kensche/.sdkman/candidates/java/8.0.171-oracle/bin/javac
  /home/kensche/.sdkman/candidates/groovy/2.4.15/bin/groovy
Using Java to start Roddy
Roddy version 3.0.7
Loading properties file /icgc/dkfzlsdf/analysis/B080/kensche/RoddyProject_Roddy3.0/testConfigs/applicationProperties-analysis-local-lsf.ini.
Roddy will try to use the DirectSynchronousExecutionJobManager job manager class to manage jobs.

Loaded plugin BamToFastqPlugin:develop (/icgc/dkfzlsdf/analysis/B080/kensche/RoddyProject_Roddy3.0/plugins_3.0/BamToFastqPlugin)
Loaded plugin PluginBase:1.2.1-0       (/icgc/dkfzlsdf/analysis/B080/kensche/RoddyProject_Roddy3.0/Roddy/dist/plugins/PluginBase_1.2.1)
Loaded plugin DefaultPlugin:develop    (/icgc/dkfzlsdf/analysis/B080/kensche/RoddyProject_Roddy3.0/Roddy/dist/plugins/DefaultPlugin)

Load configuration file /icgc/dkfzlsdf/analysis/B080/kensche/RoddyProject_Roddy3.0/testConfigs/BamToFastqPlugin/test1.xml
Load configuration file /icgc/dkfzlsdf/analysis/B080/kensche/RoddyProject_Roddy3.0/Roddy/dist/plugins/DefaultPlugin/resources/configurationFiles/default.xml
Load configuration file /icgc/dkfzlsdf/analysis/B080/kensche/RoddyProject_Roddy3.0/plugins_3.0/BamToFastqPlugin/resources/configurationFiles/bam2fastq.xml

Found 8 datasets in the in- and output directories.
Listing read groups in '/icgc/dkfzlsdf/analysis/B080/kensche/testData/view-by-pid/gms/data/gerald_D1VCPACXX_1.bam'. This may take a while.
Listing read groups in '/icgc/dkfzlsdf/analysis/B080/kensche/testData/view-by-pid/gms/data/gerald_D1VCPACXX_2.bam'. This may take a while.
Listing read groups in '/icgc/dkfzlsdf/analysis/B080/kensche/testData/view-by-pid/gms/data/gerald_D1VCPACXX_3.bam'. This may take a while.
Listing read groups in '/icgc/dkfzlsdf/analysis/B080/kensche/testData/view-by-pid/gms/data/gerald_D1VCPACXX_4.bam'. This may take a while.
Listing read groups in '/icgc/dkfzlsdf/analysis/B080/kensche/testData/view-by-pid/gms/data/gerald_D1VCPACXX_5.bam'. This may take a while.
Listing read groups in '/icgc/dkfzlsdf/analysis/B080/kensche/testData/view-by-pid/gms/data/gerald_D1VCPACXX_6.bam'. This may take a while.
Listing read groups in '/icgc/dkfzlsdf/analysis/B080/kensche/testData/view-by-pid/gms/data/gerald_D1VCPACXX_7.bam'. This may take a while.
Listing read groups in '/icgc/dkfzlsdf/analysis/B080/kensche/testData/view-by-pid/gms/data/gerald_D1VCPACXX_8.bam'. This may take a while.
Listing read groups in '/icgc/dkfzlsdf/analysis/B080/kensche/testData/view-by-pid/gms/data/gerald_D1VCPACXX_1.bam'. This may take a while.
An unknown / unhandled exception occurred: 'Job log file /icgc/dkfzlsdf/analysis/B080/kensche/roddy-bam2fastq-test/gms/roddyExecutionStore/exec_180425_111159639_kensche_convert/180425_111159639_directTool:bamListReadGroups.o25 does not exist'
de.dkfz.roddy.execution.io.ExecutionService.runDirect(ExecutionService.groovy:194)
de.dkfz.roddy.bam2fastq.BamToFastqWorkflow.listReadGroups(BamToFastqWorkflow.groovy:74)
de.dkfz.roddy.bam2fastq.BamToFastqWorkflow$_readAllReadGroups_closure3.doCall(BamToFastqWorkflow.groovy:133)
de.dkfz.roddy.bam2fastq.BamToFastqWorkflow$_readAllReadGroups_closure3.call(BamToFastqWorkflow.groovy)
org.codehaus.groovy.runtime.DefaultGroovyMethods.collectEntries(DefaultGroovyMethods.java:3632)
org.codehaus.groovy.runtime.DefaultGroovyMethods.collectEntries(DefaultGroovyMethods.java:3539)
org.codehaus.groovy.runtime.DefaultGroovyMethods.collectEntries(DefaultGroovyMethods.java:3564)
de.dkfz.roddy.bam2fastq.BamToFastqWorkflow.readAllReadGroups(BamToFastqWorkflow.groovy:132)
de.dkfz.roddy.bam2fastq.BamToFastqWorkflow.setupExecution(BamToFastqWorkflow.groovy:145)
de.dkfz.roddy.bam2fastq.BamToFastqWorkflow$setupExecution.callCurrent(Unknown Source)
de.dkfz.roddy.core.Workflow.setupExecution(Workflow.groovy:56)
de.dkfz.roddy.core.Analysis.prepareExecution(Analysis.java:361)
de.dkfz.roddy.core.Analysis.executeRun(Analysis.java:397)
de.dkfz.roddy.core.Analysis.executeRun(Analysis.java:341)
de.dkfz.roddy.core.Analysis.rerun(Analysis.java:234)
de.dkfz.roddy.client.cliclient.RoddyCLIClient.testrerun(RoddyCLIClient.groovy:537)
de.dkfz.roddy.client.cliclient.RoddyCLIClient.parseStartupMode(RoddyCLIClient.groovy:122)
de.dkfz.roddy.Roddy.parseRoddyStartupModeAndRun(Roddy.java:737)
de.dkfz.roddy.Roddy.startup(Roddy.java:287)
de.dkfz.roddy.Roddy.main(Roddy.java:214)

(Note this is a development version from the development branch with callDirect() renamed to runDirect(). Current commit f343e5e)

I could trace further into de.dkfz.roddy.execution.io.ExecutionService#execute(de.dkfz.roddy.execution.jobs.Command, boolean). However it is not clear to me what the semantics of the variables configurationDisallowsJobSubmission, allJobsBlocked, pidIsBlocked, preventCalls, isDummyCommand is and why particularly for testrerun again a special branch is followed.

TODO: Clean up the semantics of all these variables and fix the problem with runDirect() in testrerun mode.

vinjana commented 6 years ago

Apparently, this could relatively easily be quick-fixed by making some variables static and synchronized maps on dataset, to persist their content over the QUERY_STATUS and actual run phases. Whether that is a good solution, I do not know. The code above still needs refactoring, as there is simply too many factors deciding, whether to run or not etc. There should be some "deciderX" code.

dankwart-de commented 6 years ago

Done