PacificBiosciences / HiFi-human-WGS-WDL

BSD 3-Clause Clear License
49 stars 26 forks source link

paraphase step output parsing error #113

Closed apaul7 closed 7 months ago

apaul7 commented 7 months ago

Describe the bug Parser fails to locate output files after the Paraphase step when sample name has a .. The Paraphase tool completes successfully. From what I can tell the Paraphase tool outputs files based on the basename of input bam.

with a sample name of sample.1 the input bam for this step was sample.1.m84081_240112_181707_s1.hifi_reads.bc2031.GRCh38.aligned.haplotagged.bam the paraphase output bam is sample.1.paraphase/sample_realigned_tagged.bam but the wdl looks for sample.1.paraphase/sample.1_realigned_tagged.bam same issue for the other outputs.

Details

Expected behavior The WDL step to successfully find the output files and continue with the rest of the workflow.

Additional context after editing the output files to below I was able to get the whole workflow to successfully complete. I'm sure there's a better way but I don't write WDL.

diff --git a/workflows/sample_analysis/sample_analysis.wdl b/workflows/sample_analysis/sample_analysis.wdl
index 1f4f1a8..ec3fc27 100644
--- a/workflows/sample_analysis/sample_analysis.wdl
+++ b/workflows/sample_analysis/sample_analysis.wdl
@@ -641,10 +641,10 @@ task paraphase {
        >>>
        output {
-               File output_json = "~{out_directory}/~{sample_id}.json"
-               File realigned_bam = "~{out_directory}/~{sample_id}_realigned_tagged.bam"
-               File realigned_bam_index = "~{out_directory}/~{sample_id}_realigned_tagged.bam.bai"
-               Array[File] paraphase_vcfs = glob("~{out_directory}/~{sample_id}_vcfs/*.vcf")
+               File output_json = glob("~{out_directory}/*.json")[0]
+               File realigned_bam = glob("~{out_directory}/*_realigned_tagged.bam")[0]
+               File realigned_bam_index = glob("~{out_directory}/*_realigned_tagged.bam.bai")[0]
+               Array[File] paraphase_vcfs = glob("~{out_directory}/*_vcfs/*.vcf")
        }

        runtime {

paraphase code here and here

Thanks for making this great WDL workflow!

williamrowell commented 7 months ago

Hi Alex,

Thanks for filing this issue. Regarding the Paraphase code, I moved the issue upstream to Paraphase. On the workflow side, I typically recommend that sample names contain only alphanumeric characters, dashes, and underscores. I'll update the documentation in the next release to reflect this.

Thanks!

apaul7 commented 7 months ago

Thanks! I almost filed an issue there. Figured Paraphase was running as expected just mistake output gathering.