broadinstitute / wdltool

BSD 3-Clause "New" or "Revised" License
18 stars 4 forks source link

wdltool doesn't throw error if input isn't defined #8

Closed tmdefreitas closed 6 years ago

tmdefreitas commented 8 years ago

In the following WDL, GSEA_v_1_0_fwer_p_val_threshold was not declared as an input, but the validator didn't raise an error. Cromwell choked when trying to run the task.

Is there a reason wdltool shouldn't throw an error here?

task tool_gsea_mrnaseq_subtypes {
  String outputprefix
  String pheno_from_aggregate_molecular_subtype_clusters
  String pheno_name
  File tcga_pheno_FileName
  File tcga_exp_FileName
  File gs_db
  String GSEA_v_1_0_reshuffling_type
  String GSEA_v_1_0_nperm
  String GSEA_v_1_0_weighted_score_type
  String GSEA_v_1_0_nom_p_val_threshold
  String GSEA_v_1_0_topgs
  String GSEA_v_1_0_adjust_FDR_q_val
  String GSEA_v_1_0_gs_size_threshold_min
  String GSEA_v_1_0_gs_size_threshold_max
  String GSEA_v_1_0_reverse_sign
  String GSEA_v_1_0_perm_type

    command {
    /R/RunR.sh -f main /src/Pathway_GSEA.R --libdir/src --disease_type${outputprefix} --pheno.from.Aggregate_Molecular_Subtype_Clusters${pheno_from_aggregate_molecular_subtype_clusters} --pheno.name${pheno_name} --tcga.pheno.FileName${tcga_pheno_FileName} --tcga.exp.FileName${tcga_exp_FileName} --gs.db${gs_db} --GSEA.v.1.0.reshuffling.type${GSEA_v_1_0_reshuffling_type} --GSEA.v.1.0.nperm${GSEA_v_1_0_nperm} --GSEA.v.1.0.weighted.score.type${GSEA_v_1_0_weighted_score_type} --GSEA.v.1.0.nom.p.val.threshold${GSEA_v_1_0_nom_p_val_threshold} --GSEA.v.1.0.fwer.p.val.threshold${GSEA_v_1_0_fwer_p_val_threshold} --GSEA.v.1.0.fdr.q.val.threshold${GSEA_v_1_0_fdr_q_val_threshold} --GSEA.v.1.0.topgs${GSEA_v_1_0_topgs} --GSEA.v.1.0.adjust.FDR.q.val${GSEA_v_1_0_adjust_FDR_q_val} --GSEA.v.1.0.gs.size.threshold.min${GSEA_v_1_0_gs_size_threshold_min} --GSEA.v.1.0.gs.size.threshold.max${GSEA_v_1_0_gs_size_threshold_max} --GSEA.v.1.0.reverse.sign${GSEA_v_1_0_reverse_sign} --GSEA.v.1.0.perm.type${GSEA_v_1_0_perm_type}

  zip -r ${outputprefix}.pathway_gsea_mrnaseq_subtypes.zip . 
    }

    output {
    File zip_results="${outputprefix}.pathway_gsea_mrnaseq_subtypes.zip" 
    }

    runtime {
        docker : "broadgdac/tool_gsea_mrnaseq_subtypes:22"
    }

    meta {
    author : "Juok Cho"
        email : "jcho@broadinstitute.org"
    }

}

workflow tool_gsea_mrnaseq_subtypes_workflow {
    call tool_gsea_mrnaseq_subtypes
}

Thanks, Tim

geoffjentry commented 8 years ago

When talking to @knoblett she said that this appears to be something which previously was resolved - perhaps there's been a regression, or it's slightly different somehow

kbergin commented 8 years ago

I also observed something similar.

One of my tasks was failing because I had an input designated as an input in the call, and in the task, but it actually wasn't an input for the workflow nor in the json. I feel like validate or some error handling should have caught that? Instead it just didn't make it through JES and said it 'failed to localize inputs'. But Brad pointed out that I think I used swagger to validate not wdltools, so this could be entirely my fault, as he said those may not be in sync / up to date wise? I wasn't aware.

Anyways here was my situation in case it's at all helpful, the missing var is File gender_mask_bed:

workflow GenomeStripBamWorkflow {
    String sample_name
    String bam_name
    String analysis_directory
    File bam
    File ref_fasta
    File ref_fasta_index
    File ref_dict
    File ref_genome_sizes
    File ploidy_map
    File copy_number_mask
    File copy_number_mask_index
    File read_depth_mask
    File ref_profile
    File genome_mask
    File genome_mask_index
    File configs

##This is the call that had the issue, there are some before it that it depends on, 
##but not for the missing input (gender_mask_bed)
    call CallSampleGender as CallSampleGender {
        input:
            analysis_directory = analysis_directory,
            ref_fasta = ref_fasta,
            ref_fasta_index = ref_fasta_index,
            ref_dict = ref_dict,
            genome_mask = genome_mask,
            genome_mask_index = genome_mask_index,
            ploidy_map = ploidy_map,
            header_bam = ExtractBamSubset.header_bam,
            header_bam_index = IndexHeaders.header_index,
            gender_mask_bed = gender_mask_bed,
            read_count_index = Index.read_count_index
    }
}

And here's the task, which does have the missing variable File gender_mask_bed

task CallSampleGender {
  String analysis_directory
  File ref_fasta
  File ref_fasta_index
  File ref_dict
  File genome_mask
  File genome_mask_index
  File ploidy_map
  File header_bam
  File header_bam_index
  File gender_mask_bed
  File read_count_index

  command {
     mkdir ${analysis_directory}
     java -Xmx4000m \
     -classpath /usr/gitc/svtoolkit2.00/lib/SVToolkit.jar:/usr/gitc/svtoolkit2.00/lib/gatk/GenomeAnalysisTK.jar \
     org.broadinstitute.sv.apps.CallSampleGender \
     -R ${ref_fasta} \
     -genomeMaskFile ${genome_mask} \
     -ploidyMapFile ${ploidy_map} \
     -md ${analysis_directory} \
     -I ${header_bam} \
     -bedFile ${gender_mask_bed} \
     -O sample_gender.report.txt
  }
    runtime {
      docker: "kbergin/kbergin_test"
      memory: "4 GB"
    }
    output{
        File sample_gender_report = "sample_gender.report.txt"
    }
}
knoblett commented 8 years ago

In running a highlight command today, I noticed that the wdltool caught an undeclared input error. Has this bug been fixed, or is wdltool still not picking up undeclared inputs within tasks only? (my example is missing a declared input in the workflow's call to the task, due to a typo.)

screen shot 2016-04-25 at 12 55 02 pm

I am using the Cromwell 0.18 release and wdltool 0.1 release.

mcovarr commented 6 years ago

Issue moved to broadinstitute/cromwell #2874 via ZenHub