CDCgov / phoenix

🔥🐦🔥PHoeNIx: A short-read pipeline for healthcare-associated and antimicrobial resistant pathogens
Apache License 2.0
52 stars 19 forks source link

SPAdes fails #82

Closed johnsonj161 closed 1 year ago

johnsonj161 commented 1 year ago

Spades fails when running the current version of the pipeline (-r main or -r v1.0.0). The command used is below:

nextflow run cdcgov/phoenix -r v1.0.0 -profile singularity -entry PHOENIX --input manifest.csv --kraken2db /home/databases/ --outdir $PWD/test

The outputs of the .command files is shown below:

==> work/38/e248803652d1a3b48d4ffe5d359b45/.command.begin <==

==> work/38/e248803652d1a3b48d4ffe5d359b45/.command.err <==
cp: cannot create regular file '/home/test/isolate1/': No such file or directory
cp: cannot create regular file '/home/test/isolate1': No such file or directory

==> work/38/e248803652d1a3b48d4ffe5d359b45/.command.log <==
Option -k triggered, argument = isolate1.trimd_summary.txt
Option -n triggered, argument = isolate1
Option -d triggered, argument = /home/test
cp: cannot create regular file '/home/test/isolate1/': No such file or directory
cp: cannot create regular file '/home/test/isolate1': No such file or directory
find: ‘*.spades.log’: No such file or directory

==> work/38/e248803652d1a3b48d4ffe5d359b45/.command.out <==
Option -k triggered, argument = isolate1.trimd_summary.txt
Option -n triggered, argument = isolate1
Option -d triggered, argument = /home/test

==> work/38/e248803652d1a3b48d4ffe5d359b45/.command.run <==
#!/bin/bash
# NEXTFLOW TASK: PHOENIX:PHOENIX_EXTERNAL:SPADES_WF:SPADES (isolate1)
set -e
set -u
NXF_DEBUG=${NXF_DEBUG:=0}; [[ $NXF_DEBUG > 1 ]] && set -x
NXF_ENTRY=${1:-nxf_main}

nxf_tree() {
    local pid=$1

    declare -a ALL_CHILDREN
    while read P PP;do
        ALL_CHILDREN[$PP]+=" $P"
    done < <(ps -e -o pid= -o ppid=)

    pstat() {
        local x_pid=$1
        local STATUS=$(2> /dev/null < /proc/$1/status egrep 'Vm|ctxt')

        if [ $? = 0 ]; then
        local  x_vsz=$(echo "$STATUS" | grep VmSize | awk '{print $2}' || echo -n '0')
        local  x_rss=$(echo "$STATUS" | grep VmRSS | awk '{print $2}' || echo -n '0')
        local x_peak=$(echo "$STATUS" | egrep 'VmPeak|VmHWM' | sed 's/^.*:\s*//' | sed 's/[\sa-zA-Z]*$//' | tr '\n' ' ' || echo -n '0 0')
        local x_pmem=$(awk -v rss=$x_rss -v mem_tot=$mem_tot 'BEGIN {printf "%.0f", rss/mem_tot*100*10}' || echo -n '0')
        local vol_ctxt=$(echo "$STATUS" | grep '\bvoluntary_ctxt_switches' | awk '{print $2}' || echo -n '0')
        local inv_ctxt=$(echo "$STATUS" | grep '\bnonvoluntary_ctxt_switches' | awk '{print $2}' || echo -n '0')
        cpu_stat[x_pid]="$x_pid $x_pmem $x_vsz $x_rss $x_peak $vol_ctxt $inv_ctxt"
        fi
    }

    pwalk() {
        pstat $1
        for i in ${ALL_CHILDREN[$1]:=}; do pwalk $i; done
    }

    pwalk $1
}

nxf_stat() {
    cpu_stat=()
    nxf_tree $1

    declare -a sum=(0 0 0 0 0 0 0 0)
    local pid
    local i
    for pid in "${!cpu_stat[@]}"; do
        local row=(${cpu_stat[pid]})
        [ $NXF_DEBUG = 1 ] && echo "++ stat mem=${row[*]}"
        for i in "${!row[@]}"; do
        if [ $i != 0 ]; then
            sum[i]=$((sum[i]+row[i]))
        fi
        done
    done

    [ $NXF_DEBUG = 1 ] && echo -e "++ stat SUM=${sum[*]}"

    for i in {1..7}; do
        if [ ${sum[i]} -lt ${cpu_peak[i]} ]; then
            sum[i]=${cpu_peak[i]}
        else
            cpu_peak[i]=${sum[i]}
        fi
    done

    [ $NXF_DEBUG = 1 ] && echo -e "++ stat PEAK=${sum[*]}\n"
    nxf_stat_ret=(${sum[*]})
}

nxf_mem_watch() {
    set -o pipefail
    local pid=$1
    local trace_file=.command.trace
    local count=0;
    declare -a cpu_stat=(0 0 0 0 0 0 0 0)
    declare -a cpu_peak=(0 0 0 0 0 0 0 0)
    local mem_tot=$(< /proc/meminfo grep MemTotal | awk '{print $2}')
    local timeout
    local DONE
    local STOP=''

    [ $NXF_DEBUG = 1 ] && nxf_sleep 0.2 && ps fx

    while true; do
        nxf_stat $pid
        if [ $count -lt 10 ]; then timeout=1;
        elif [ $count -lt 120 ]; then timeout=5;
        else timeout=30;
        fi
        read -t $timeout -r DONE || true
        [[ $DONE ]] && break
        if [ ! -e /proc/$pid ]; then
            [ ! $STOP ] && STOP=$(nxf_date)
            [ $(($(nxf_date)-STOP)) -gt 10000 ] && break
        fi
        count=$((count+1))
    done

    echo "%mem=${nxf_stat_ret[1]}"      >> $trace_file
    echo "vmem=${nxf_stat_ret[2]}"      >> $trace_file

==> work/38/e248803652d1a3b48d4ffe5d359b45/.command.sh <==
#!/bin/bash -euo pipefail
bash /home/.nextflow/assets/cdcgov/phoenix/bin/pipeline_stats_writer_trimd.sh -a isolate1_raw_read_counts.txt -b isolate1_trimmed_read_counts.txt -c isolate1_1.trim.fastq.gz -d isolate1_2.trim.fastq.gz -e isolate1.kraken2_trimd.report.txt -f isolate1.trimd_summary.txt -g isolate1_trimd.html
sh /home/.nextflow/assets/cdcgov/phoenix/bin/beforeSpades.sh -k isolate1.trimd_summary.txt -n isolate1 -d /home/test

cat <<-END_VERSIONS > versions.yml
"PHOENIX:PHOENIX_EXTERNAL:SPADES_WF:SPADES":
    spades: $(spades.py --version 2>&1 | sed 's/^.*SPAdes genome assembler v//; s/ .*$//')
END_VERSIONS

spades_complete=run_failure,no_scaffolds,no_contigs
echo $spades_complete | tr -d "\n" > isolate1_spades_outcome.csv

spades.py \
     \
    --threads 8 \
    --memory 10 \
    -s isolate1.singles.fastq.gz \
    -1 isolate1_1.trim.fastq.gz -2 isolate1_2.trim.fastq.gz \
    --phred-offset 33\
    -o ./

mv spades.log isolate1.spades.log
spades_complete=run_completed
echo $spades_complete | tr -d "\n" > isolate1_spades_outcome.csv

rm /home/test/isolate1/isolate1_summaryline_failure.tsv
rm /home/test/isolate1/isolate1.synopsis

==> work/38/e248803652d1a3b48d4ffe5d359b45/.command.trace <==
jvhagey commented 1 year ago

@johnsonj161 what kind of system are you working on (HPC, local laptop or cloud)? The error that kills this is the /.command.err <== cp: cannot create regular file '/home/test/isolate1/': No such file or directory. Basically it can't find that folder. The v1.1.0 version that we are hoping to have out sometime in Feb should have a fix for this.

johnsonj161 commented 1 year ago

I am using a local Linux machine. I ran the test dataset and I am getting the same issue. I checked and the 'missing' directory appears to be present. Do you have any idea why I would be getting this issue if the directory is present? and if so, is there a temporary fix prior to the v1.1.0 update being released? I appreciate your help!

jvhagey commented 1 year ago

It has to do with how full paths are read by nextflow. Try adding a trailing backslash to your --outdir. If you don't mind using a dev version for a month I would use nextflow run cdcgov/phoenix -r v1.0.1 -profile singularity -entry PHOENIX --input manifest.csv --kraken2db /home/databases/ --outdir $PWD/test Let me know if that newer version works for you. This will be a good test to make sure the fixes are in fact fixes. Note that -r v1.0.1 is the branch with the correction, but ultimately when I release this branch I will make it v1.1.0 as the changes warrant the upgrade.

johnsonj161 commented 1 year ago

v1.0.1 works! I will keep my eye out for v1.1.0. Thank you!

jvhagey commented 1 year ago

Yay, I would recommend taking a clone of that as we are gonna push updates to that branch and nextflow will yell about changes at some point using nextflow run cdcgov/phoenix. So to be clear run git clone -b v1.0.1 https://github.com/CDCgov/phoenix.git that will install a folder phoenix in the folder where you ran that command then you can run it like you were before nextflow run phoenix/main.nf -profile singularity -entry PHOENIX --input manifest.csv --kraken2db /home/databases/ --outdir $PWD/test.