broadinstitute / cromwell

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
http://cromwell.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
993 stars 359 forks source link

Integrity constraint violation (cromwell 0.19) #640

Closed LeeTL1220 closed 8 years ago

LeeTL1220 commented 8 years ago

Trying to run a wdl script with cromwell 0.19 jar. I've been getting this error periodically, but as I make changes to the wdl, these disappear or come back. Unfortunately, this one is back and I cannot seem to get around it.

This is not the final version of the wdl and json, so there are some unused parameters and other bits that need cleaning ... But nothing that should yield this message, I believe. Previously, I was able to make it go away by converting a Boolean to a String.

Error message

[2016-04-05 16:30:29,530] [info] Updating WorkflowManager state. New Data: (836da883-b156-4864-91f3-8bbf32e7b12c,Actor[akka://cromwell-system/user/WorkflowManagerActor/WorkflowActor-836da883-b156-4864-91f3-8bbf32e7b12c#-804883739])
[2016-04-05 16:30:29,581] [info] WorkflowActor [836da883]: Start(Some(Actor[akka://cromwell-system/user/SingleWorkflowRunnerActor#-776202726])) message received
integrity constraint violation: unique constraint or index violation; UK_SYM_WORKFLOW_EXECUTION_ID_SCOPE_NAME_ITERATION_IO table: SYMBOL
[2016-04-05 16:30:29,770] [error] SingleWorkflowRunnerActor received Failure message: integrity constraint violation: unique constraint or index violation; UK_SYM_WORKFLOW_EXECUTION_ID_SCOPE_NAME_ITERATION_IO table: SYMBOL
java.sql.SQLIntegrityConstraintViolationException: integrity constraint violation: unique constraint or index violation; UK_SYM_WORKFLOW_EXECUTION_ID_SCOPE_NAME_ITERATION_IO table: SYMBOL
    at org.hsqldb.jdbc.JDBCUtil.sqlException(Unknown Source)
    at org.hsqldb.jdbc.JDBCUtil.sqlException(Unknown Source)
    at org.hsqldb.jdbc.JDBCPreparedStatement.fetchResult(Unknown Source)
    at org.hsqldb.jdbc.JDBCPreparedStatement.executeUpdate(Unknown Source)
    at com.zaxxer.hikari.proxy.PreparedStatementProxy.executeUpdate(PreparedStatementProxy.java:61)
    at com.zaxxer.hikari.proxy.PreparedStatementJavassistProxy.executeUpdate(PreparedStatementJavassistProxy.java)
    at slick.driver.JdbcActionComponent$InsertActionComposerImpl$MultiInsertAction$$anonfun$run$8$$anonfun$apply$1.apply(JdbcActionComponent.scala:520)
    at slick.driver.JdbcActionComponent$InsertActionComposerImpl$MultiInsertAction$$anonfun$run$8$$anonfun$apply$1.apply(JdbcActionComponent.scala:517)
    at slick.jdbc.JdbcBackend$SessionDef$class.withPreparedInsertStatement(JdbcBackend.scala:354)
    at slick.jdbc.JdbcBackend$BaseSession.withPreparedInsertStatement(JdbcBackend.scala:407)
    at slick.driver.JdbcActionComponent$ReturningInsertActionComposerImpl.preparedInsert(JdbcActionComponent.scala:636)
    at slick.driver.JdbcActionComponent$InsertActionComposerImpl$MultiInsertAction$$anonfun$run$8.apply(JdbcActionComponent.scala:517)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
    at scala.collection.AbstractTraversable.map(Traversable.scala:104)
    at slick.driver.JdbcActionComponent$InsertActionComposerImpl$MultiInsertAction.run(JdbcActionComponent.scala:522)
    at slick.driver.JdbcActionComponent$SimpleJdbcDriverAction.run(JdbcActionComponent.scala:32)
    at slick.driver.JdbcActionComponent$SimpleJdbcDriverAction.run(JdbcActionComponent.scala:29)
    at slick.backend.DatabaseComponent$DatabaseDef$$anon$2.liftedTree1$1(DatabaseComponent.scala:237)
    at slick.backend.DatabaseComponent$DatabaseDef$$anon$2.run(DatabaseComponent.scala:237)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.hsqldb.HsqlException: integrity constraint violation: unique constraint or index violation; UK_SYM_WORKFLOW_EXECUTION_ID_SCOPE_NAME_ITERATION_IO table: SYMBOL
    at org.hsqldb.error.Error.error(Unknown Source)
    at org.hsqldb.Constraint.getException(Unknown Source)
    at org.hsqldb.index.IndexAVLMemory.insert(Unknown Source)
    at org.hsqldb.persist.RowStoreAVL.indexRow(Unknown Source)
    at org.hsqldb.TransactionManagerMVCC.addInsertAction(Unknown Source)
    at org.hsqldb.Session.addInsertAction(Unknown Source)
    at org.hsqldb.Table.insertSingleRow(Unknown Source)
    at org.hsqldb.StatementDML.insertRowSet(Unknown Source)
    at org.hsqldb.StatementInsert.getResult(Unknown Source)
    at org.hsqldb.StatementDMQL.execute(Unknown Source)
    at org.hsqldb.Session.executeCompiledStatement(Unknown Source)
    at org.hsqldb.Session.execute(Unknown Source)
    ... 24 common frames omitted

WDL

#
# Workflow a single pair of case-control samples.
#
# Notes:
#
# - file names will use the entity ID specified, but inside the file, the bam SM tag will typically be used.
#
# - THIS SCRIPT SHOULD BE CONSIDERED OF "BETA" QUALITY
#
###########

workflow case_gatk_acnv_workflow {
    String wf_entity_id
    File target_bed
    File ref_fasta
    File ref_fasta_dict
    File ref_fasta_fai
    File common_snp_list
    File tumor_bam
    File tumor_bam_idx
    File normal_bam
    File normal_bam_idx
    File jar_file
    File PoN
    String is_disable_reference_validation

  call PadTargets {
    input:
        target_bed=target_bed,
        jar_file=jar_file
  }

  call CalculateTargetCoverage {
    input:
        entity_id=wf_entity_id,
        padded_target_bed=PadTargets.padded_target_bed,
        input_bam=tumor_bam,
        ref_fasta=ref_fasta,
        ref_fasta_fai=ref_fasta_fai,
        ref_fasta_dict=ref_fasta_dict,
        jar_file=jar_file,
        disable_reference_validation=is_disable_reference_validation
  }

  call NormalizeSomaticReadCounts {
    input:
        entity_id=wf_entity_id,
        coverage_file=CalculateTargetCoverage.gatk_cnv_coverage_file,
        padded_target_bed=PadTargets.padded_target_bed,
        pon=PoN,
        jar_file=jar_file
  }

  call PerformSegmentation {
    input:
        entity_id=wf_entity_id,
        jar_file=jar_file,
        tn_file=NormalizeSomaticReadCounts.tn_file,
        mem=2
  }

  call Caller {
    input:
        entity_id=wf_entity_id,
        jar_file=jar_file,
        tn_file=NormalizeSomaticReadCounts.tn_file,
        seg_file=PerformSegmentation.seg_file,
        mem=2
  }
}

# Pad the target file.  This was found to help sensitivity and specificity.  This step should only be altered
#  by advanced users.  Note that by changing this, you need to have a PoN that also reflects the change.
task PadTargets {
    File target_bed
    Int pd = 250
    Int mem = 1
    File jar_file
    command {
        java -Xmx${mem}g -Djava.library.path=/usr/lib/jni/ -jar ${jar_file} PadTargets  --targets ${target_bed} --output targets.padded.bed --padding ${pd}  --help false --version false --verbosity INFO --QUIET false
    }
    output {
        File padded_target_bed = "targets.padded.bed"
    }
    #runtime {
    #    docker: "gatk-protected/a1"
    #}
}

# Calculate the target coverage
task CalculateTargetCoverage {
    String entity_id
    File padded_target_bed
    String transform = "PCOV"
    String grouping = "SAMPLE"
    Boolean keepduplicatereads = true
    Boolean disable_all_read_filters = false
    File input_bam
    File ref_fasta
    File ref_fasta_fai
    File ref_fasta_dict
    Int mem = 4
    File jar_file
    String disable_reference_validation = "false"

    command {
        java -Xmx${mem}g -Djava.library.path=/usr/lib/jni/ -jar ${jar_file} CalculateTargetCoverage --output ${entity_id}.coverage.tsv --groupBy ${grouping} --transform ${transform} --targets ${padded_target_bed} --targetInformationColumns FULL --keepduplicatereads ${keepduplicatereads} --input ${input_bam} --reference ${ref_fasta}  --disable_all_read_filters ${disable_all_read_filters} --interval_set_rule UNION --interval_padding 0 --secondsBetweenProgressUpdates 10.0 --disableSequenceDictionaryValidation ${disable_reference_validation} --createOutputBamIndex true --help false --version false --verbosity INFO --QUIET false
    }
    output {
        File gatk_cnv_coverage_file = "${entity_id}.coverage.tsv"
    }

    #runtime {
    #    docker: "gatk-protected/a1"
    #}
}

task NormalizeSomaticReadCounts {
    String entity_id
    File coverage_file
    File padded_target_bed
    File pon
    Int mem = 2
    File jar_file
    command {
        java -Xmx${mem}g -Djava.library.path=/usr/lib/jni/ -jar ${jar_file} NormalizeSomaticReadCounts  --input ${coverage_file} \
        --targets ${padded_target_bed} --panelOfNormals ${pon} --factorNormalizedOutput ${entity_id}.fnt.tsv --tangentNormalized ${entity_id}.tn.tsv \
        --betaHatsOutput ${entity_id}.betaHats.tsv --preTangentNormalized  ${entity_id}.preTN.tsv  --help false --version false --verbosity INFO --QUIET false
    }

    output {
        File tn_file = "${entity_id}.tn.tsv"
        File pre_tn_file = "${entity_id}.preTN.tsv"
        File betahats_file = "${entity_id}.betaHats.tsv"
    }
    #runtime {
    #    docker: "gatk-protected/a1"
    #}
}

task PerformSegmentation {
    Int mem = 2
    String entity_id
    File jar_file
    File tn_file

    command {
        java -Xmx${mem}g -Djava.library.path=/usr/lib/jni/ -jar ${jar_file} PerformSegmentation  --targets ${tn_file} \
        --output ${entity_id}.seg --log2Input true  --alpha 0.01 --nperm 10000 --pmethod HYBRID --minWidth 2 --kmax 25 \
        --nmin 200 --eta 0.05 --trim 0.025 --undoSplits NONE --undoPrune 0.05 --undoSD 3 --help false --version false \
        --verbosity INFO --QUIET false
    }

    output {
        File seg_file = "${entity_id}.seg"
    }
}

task Caller {
    Int mem = 2
    String entity_id
    File jar_file
    File tn_file
    File seg_file

    command {
        java -Xmx${mem}g -Djava.library.path=/usr/lib/jni/ -jar ${jar_file} CallSegments  --targets ${tn_file} \
         --segments ${seg_file} --output ${entity_id}.called --threshold 2.0  --legacy false --experimental false \
          --help false --version false --verbosity INFO --QUIET false
    }

    output {
        File called_file="${entity_id}.called"
    }
}

JSON

{
  "case_gatk_acnv_workflow.normal_bam_idx": "/home/lichtens/broad_oncotator_configs/hcc_purity/SM-74NEG.bai",
  "case_gatk_acnv_workflow.CalculateTargetCoverage.grouping": "SAMPLE",
  "case_gatk_acnv_workflow.common_snp_list": "allchr.1kg.phase3.v5a.snp.maf10.biallelic.recode.fixed.prune5.interval_list",
  "case_gatk_acnv_workflow.ref_fasta_fai": "hg19mini.fasta.fai",
  "case_gatk_acnv_workflow.target_bed": "create-pon-all-targets.targets.bed",
  "case_gatk_acnv_workflow.wf_entity_id": "HCC1143T",
  "case_gatk_acnv_workflow.jar_file": "gatk-protected.jar",
  "case_gatk_acnv_workflow.PadTargets.mem": "1",
  "case_gatk_acnv_workflow.CalculateTargetCoverage.disable_all_read_filters": "false",
  "case_gatk_acnv_workflow.CalculateTargetCoverage.mem": "4",
  "case_gatk_acnv_workflow.tumor_bam": "HCC1143_chr3_1K_11K.tiny.bam",
  "case_gatk_acnv_workflow.ref_fasta_dict": "hg19mini.dict",
  "case_gatk_acnv_workflow.tumor_bam_idx": "HCC1143_chr3_1K_11K.tiny.bam.bai",
  "case_gatk_acnv_workflow.CalculateTargetCoverage.keepduplicatereads": "true",
  "case_gatk_acnv_workflow.normal_bam": "/home/lichtens/broad_oncotator_configs/hcc_purity/SM-74NEG.bam",
  "case_gatk_acnv_workflow.PoN": "/home/lichtens/broad_oncotator_configs/hcc_purity/ice_rcs_eval.v1.pd250.spark.pon",
  "case_gatk_acnv_workflow.CalculateTargetCoverage.transform": "PCOV",
  "case_gatk_acnv_workflow.ref_fasta": "hg19mini.fasta",
  "case_gatk_acnv_workflow.NormalizeSomaticReadCounts.mem": "4",
  "case_gatk_acnv_workflow.PadTargets.pd": "250",
  "case_gatk_acnv_workflow.is_disable_reference_validation": "true"
}
LeeTL1220 commented 8 years ago

Hey, the WDL does not validate. It does not like my specification of default values in the tasks. Not sure if this is an issue or not.

Once I removed the default values and specified values in the workflow, the constraint issue seems to have gone away.

@scottfrazer @kshakir

LeeTL1220 commented 8 years ago

(Feel free to close this issue)

kshakir commented 8 years ago

I am curious about the error, but will leave it to @ruchim to decide about closing or investigating further.

mcovarr commented 8 years ago

Given the preceding comments and the fact that Cromwell has been rewritten, I'm going to close this.