[BUG] - Resume flag restarts pipeline at SPADES, even if pipeline has completed downstream steps

Describe the bug If I have a pipeline error and I pass nextflow the -resume flag the pipeline restarts at SPADES, even if downstream modules have completed. I think this is because of the way that SPADES is being run. Because the input is given path(full_outdir) that would mean anytime the output dir changes, this module would need to be re-run.

Impact When there is an error in one of the last rules, the majority (and most time-consuming) part of the pipeline has to be re-run, even though these rules have already gone to completion.

To Reproduce

What environment you are running the pipeline in (i.e. HPC, local, cloud etc): AWS EC2
What version of the pipeline you are using: 2.0.2
The command that was run to cause the error: nextflow run phoenix/main.nf -resume -profile docker -entry PHOENIX --input manifest/samplesheet_01.csv --outdir /path/to/output
If you used a custom config file please provide it: NA
The text of the error itself - screenshots are great for this! : There isn't an error, just a bug
Include the files that caused the error
- (if the file(s) contains something you don't want public open an issue with a brief description AND then email HAISeq@cdc.gov with the title QH ISSUE # so we can track the problem. DO NOT send PII!: happens with any file input

Expected behavior I would expect that the resume flag would continue the pipeline from the point at which it failed.

Additional context I did try to see if I could come up with a quick solution that didn't include rewriting the SPADES module.... If you remove the path(outdir) from the input and edit the code to avoid using the full path variable (I basically stopped the fail log from being created) it runs with a warning (which makes sense given the tuple is incomplete):

WARN: Input tuple does not match input set cardinality declared by process `PHOENIX_SLIM:PHOENIX_EXTERNAL_SLIM:SPADES_WF:SPADES` -- offending value: [[id:v2], [/output/dir/work/82/e8d8d3abb9c77bd86eb3baad10465c/v2_1.trim.fastq.gz, /output/dir/work/82/e8d8d3abb9c77bd86eb3baad10465c/v2_2.trim.fastq.gz], /output/dir/work/c5/aef45b22ff908136d1fb98e291ce76/v2.singles.fastq.gz, /output/dir/work/fb/b3d5f52649fc64c19d8e2aea70f4e0/v2.kraken2_trimd.top_kraken_hit.txt, /output/dir/work/89/5d37276a9cc8e20e0ac46dd90b15d5/v2_raw_read_counts.txt, /output/dir/work/c1/cf4bb3c83f69750599b08228b27da5/v2_trimmed_read_counts.txt, /output/dir/work/a4/48219be006a02dcd541fb808ab5cb5/v2.kraken2_trimd.summary.txt, /output/dir/]

This isn't ideal though, since I can see that this is important for error logging, if SPADES doesn't create the outputs as expected. Hoping you guys might have another cleaner solution, or maybe a different perspective on why this might not be working as expected for me?

Hi @slsevilla thanks for your interest in PHX! We have noticed this and haven't totally dug into it so asks for the first troubleshooting pass. My hunch is that a hash gets created for the cache when the *.synopsis and *_summaryline_failure.tsv are initially created and then when we delete things it makes Nextflow think the process didn't complete fully. My initial thoughts on how to get this SPAdes error information different ways either involve a fairly big code rewrite and/or have their own downside. For example, I think we could not run the downstream *.synopsis generation by sample and just give it a directory, but this has the downside of now not being in parallel and could just slow things way down. We will see what we can come up with, but as it not totally breaking this will end up lower on the priority list tbh.

I was able to fix this by doing the following:

removing 'outdir' as an input path
remove the creation of the file before/after the process is run (lines 43,89-91)

For anyone looking for the quick and dirty fix, this is it:

# preemptively create _summary_line.csv and .synopsis file in case spades fails (no contigs or scaffolds created) we can still collect upstream stats. 
    ${ica}pipeline_stats_writer_trimd.sh -a ${fastp_raw_qc} -b ${fastp_total_qc} -c ${reads[0]} -d ${reads[1]} -e ${kraken2_trimd_report} -f ${k2_bh_summary} -g ${krona_trimd}

    #get version information
    bspades_version=\$(${ica}beforeSpades.sh -V)
    pipestats_version=\$(${ica}pipeline_stats_writer_trimd.sh -V)
    aspades_version=\$(${ica}afterSpades.sh -V)

    cat <<-END_VERSIONS > versions.yml
    "${task.process}":
        spades: \$(spades.py --version 2>&1 | sed 's/^.*SPAdes genome assembler v//; s/ .*\$//')
        spades_container: ${container}
        \${bspades_version}
        \${aspades_version}
        \${pipestats_version}
    END_VERSIONS

    # Set default to be that spades fails and doesn't create scaffolds or contigs
    spades_complete=run_failure,no_scaffolds,no_contigs
    echo \$spades_complete | tr -d "\\n" > ${prefix}_spades_outcome.csv

    if [[ -z \$(zcat $unpaired_reads) ]]; then
        spades.py \\
            $args \\
            --threads $task.cpus \\
            --memory $maxmem \\
            $input_reads \\
            --phred-offset $phred_offset\\
            -o ./

    else
        spades.py \\
            $args \\
            --threads $task.cpus \\
            --memory $maxmem \\
            $single_reads \\
            $input_reads \\
            --phred-offset $phred_offset\\
            -o ./
    fi

    mv spades.log ${prefix}.spades.log

    # Overwrite default that spades failed
    # Lets downstream process know that spades completed ok - see spades_failure.nf subworkflow
    spades_complete=run_completed
    echo \$spades_complete | tr -d "\\n" > ${prefix}_spades_outcome.csv

    #Create a summaryline file that will be deleted later if spades is successful if not this line shows up in the final Phoenix_output_summary file
    #create file '*_spades_outcome.csv' to state if spades fails, if contigs or scaffolds are created. See spades_failure.nf subworkflow
    #This file will determine if downstream process GENERATE_PIPELINE_STATS_FAILURE and CREATE_SUMMARY_LINE_FAILURE will run (if spades creates contigs, but not scaffolds).
    ${ica}afterSpades.sh

This allows me to pick up after spades, if the pipeline stops for some reason.

I think (for us) a key to the NF workflows is picking up where something fails or stops, so the downside of having the synopsis file failure due to a spades failure is something I'll have to handle when/if it happens (haven't had this yet). I hadn't implemented the 'fairy' rules until this last update and it's happening there too. That's another huge re-write to consider too...

CDCgov / phoenix

[BUG] - Resume flag restarts pipeline at SPADES, even if pipeline has completed downstream steps #129