broadinstitute / malaria

MIT License
1 stars 0 forks source link

cutadapters.wdl crashes if it encounters and empty fastq.gz file #1

Open jorgeamaya opened 2 weeks ago

jorgeamaya commented 2 weeks ago

cutadapters.wdl crashes if it encounters and empty fastq.gz file, crashing the whole pipeline. An exception must be added to deal with empty fastq files.

Example of an error message below:

Output will be written into the directory: /cromwell_root/Results/
Using user-specified basename (>>SN23KDG106_S37_L001_R1_001<<) instead of deriving the filename from the input file(s)
Input file '/cromwell_root/fc-dab15d04-6d9f-4b07-9b00-045a60cdb4bf/cigass_mad4hatter/raw_files/run4/SN23KDG106_S37_L001_R1_001.fastq.gz' seems to be completely empty. Consider respecifying!

2024/11/08 03:58:45 Starting delocalization.
2024/11/08 03:58:46 Delocalization script execution started...
2024/11/08 03:58:46 Delocalizing output /cromwell_root/memory_retry_rc -> gs://fc-dab15d04-6d9f-4b07-9b00-045a60cdb4bf/submissions/05dd0c74-7a1a-4ba3-a033-a4b09f417173/ampseq/ec4f70ff-4c46-4b34-bda1-a553106ab4a9/call-t_002_cutadapters/shard-14/attempt-2/memory_retry_rc
2024/11/08 03:58:48 Delocalizing output /cromwell_root/rc -> gs://fc-dab15d04-6d9f-4b07-9b00-045a60cdb4bf/submissions/05dd0c74-7a1a-4ba3-a033-a4b09f417173/ampseq/ec4f70ff-4c46-4b34-bda1-a553106ab4a9/call-t_002_cutadapters/shard-14/attempt-2/rc
2024/11/08 03:58:50 Delocalizing output /cromwell_root/stdout -> gs://fc-dab15d04-6d9f-4b07-9b00-045a60cdb4bf/submissions/05dd0c74-7a1a-4ba3-a033-a4b09f417173/ampseq/ec4f70ff-4c46-4b34-bda1-a553106ab4a9/call-t_002_cutadapters/shard-14/attempt-2/stdout
2024/11/08 03:58:51 Delocalizing output /cromwell_root/stderr -> gs://fc-dab15d04-6d9f-4b07-9b00-045a60cdb4bf/submissions/05dd0c74-7a1a-4ba3-a033-a4b09f417173/ampseq/ec4f70ff-4c46-4b34-bda1-a553106ab4a9/call-t_002_cutadapters/shard-14/attempt-2/stderr
2024/11/08 03:58:52 Delocalizing output /cromwell_root/Results/SN23KDG106_S37_L001_R1_001_val_2.fq.gz -> gs://fc-dab15d04-6d9f-4b07-9b00-045a60cdb4bf/submissions/05dd0c74-7a1a-4ba3-a033-a4b09f417173/ampseq/ec4f70ff-4c46-4b34-bda1-a553106ab4a9/call-t_002_cutadapters/shard-14/attempt-2/Results/SN23KDG106_S37_L001_R1_001_val_2.fq.gz
Required file output '/cromwell_root/Results/SN23KDG106_S37_L001_R1_001_val_2.fq.gz' does not exist.
gmboowa commented 1 week ago

workflow CutAdaptersWorkflow { input { File fastq_file }

# Task to check if the fastq file is empty
task CheckFileSize {
    input {
        File fastq_file
    }

    command <<<
        if [[ $(zcat ${fastq_file} | wc -c) -eq 0 ]]; then
            echo "EMPTY"
        else
            echo "NONEMPTY"
        fi
    >>>

    output {
        String file_status = read_string(stdout())
    }
}

# Task to run cutadapt on non-empty fastq files
task RunCutadapt {
    input {
        File fastq_file
    }

    command {
        # Run cutadapt (replace this with your cutadapt command)
        cutadapt -o trimmed.fastq.gz ${fastq_file}
    }

    output {
        File trimmed_fastq = "trimmed.fastq.gz"
    }
}

# Define the workflow steps
call CheckFileSize { input: fastq_file = fastq_file }

# Conditional call to RunCutadapt only if the file is not empty
if (CheckFileSize.file_status == "NONEMPTY") {
    call RunCutadapt { input: fastq_file = fastq_file }
}

}