cbg-ethz / V-pipe

V-pipe is a pipeline designed for analysing NGS data of short viral genomes
https://cbg-ethz.github.io/V-pipe/
Apache License 2.0
129 stars 43 forks source link

Activating snv in sars tutorial leads to error #146

Open schlaepferp opened 1 year ago

schlaepferp commented 1 year ago

I'm not sure if this is intentional (read: I miss something that I should provide somewhere), or if this is a bug. I tried the same thing with other SARS reads, same behavior.

What I did:

I set up the SARS tutorial. It runs through fine.

I afterwards modified the config.yaml and changed the line snv: false to snf: true

Result: It breaks the code.

Expected behavior: Either an error message telling me what is missing, or running through smoothly.

Current dump

Activating conda environment: .snakemake/conda/2db3c1462cfa3088aba362563d569612_ usage: shorah [options] shotgun [-h] [-v] -b BAM -f REF [-a FLOAT] [-r chrm:start-stop] [-R INT] [-x INT] [-S FLOAT] [-I] [-p FLOAT] [-of {csv,vcf} [{csv,vcf} ...]] [-c INT] [-w INT] [-s INT] [-k] [-t INT] shorah [options] shotgun: error: argument -r/--region: expected one argument usage: shorah [options] shotgun [-h] [-v] -b BAM -f REF [-a FLOAT] [-r chrm:start-stop] [-R INT] [-x INT] [-S FLOAT] [-I] [-p FLOAT] [-of {csv,vcf} [{csv,vcf} ...]] [-c INT] [-w INT] [-s INT] [-k] [-t INT] shorah [options] shotgun: error: argument -r/--region: expected one argument [Tue Jun 13 14:44:29 2023] Error in rule snv: jobid: 20 input: /home/usr1/vpipe_test_dir/V-pipe/workflow/../resources/sars-cov-2/NC_045512.2.fasta, results/SRR10903401/20200102/alignments/REF_aln.bam, results/SRR10903401/20200102/variants/coverage_intervals.tsv output: results/SRR10903401/20200102/variants/SNVs/snvs.csv, results/SRR10903401/20200102/variants/SNVs/snvs.vcf log: results/SRR10903401/20200102/variants/SNVs/shorah.out.log, results/SRR10903401/20200102/variants/SNVs/shorah.err.log (check log file(s) for error details) conda-env: /home/usr1/vpipe_testdir/work/.snakemake/conda/2db3c1462cfa3088aba362563d569612 shell:

    let "WINDOW_SHIFTS=(250 * 4/5 + 3) / 3"
    let "WINDOW_LEN=WINDOW_SHIFTS * 3"

    echo "Windows are shifted by: ${WINDOW_SHIFTS} bp" > results/SRR10903401/20200102/variants/SNVs/shorah.out.log
    echo "The window length is: ${WINDOW_LEN} bp" >> results/SRR10903401/20200102/variants/SNVs/shorah.out.log

    # Get absolute path for input files
    CWD=${PWD}
    BAM=${PWD}/results/SRR10903401/20200102/alignments/REF_aln.bam
    REF=/home/usr1/vpipe_test_dir/V-pipe/workflow/../resources/sars-cov-2/NC_045512.2.fasta; [[ ${REF} =~ ^/ ]] || REF=${PWD}/${REF}
    OUTFILE=${PWD}/results/SRR10903401/20200102/variants/SNVs/shorah.out.log
    ERRFILE=${PWD}/results/SRR10903401/20200102/variants/SNVs/shorah.err.log
    WORK_DIR=${PWD}/results/SRR10903401/20200102/variants/SNVs

    if [[ -n "" ]]; then
        # put input files in localscratch
        rsync -aq "${BAM}" "${REF}" "/"
        BAM="/${BAM##*/}"
        REF="/${REF##*/}"
    fi

    # Run ShoRAH in each of the predetermined regions (regions with sufficient coverage)
    LINE_COUNTER=0
    FILES=( )
    FILES_VCF=( )
    while read -r region || [[ -n ${region} ]]
    do
        echo "Running ShoRAH on region: ${region}" >> $OUTFILE
        (( ++LINE_COUNTER ))
        # Create directory for running ShoRAH in a corresponding region (if doesn't exist)
        DIR=${WORK_DIR}/REGION_${LINE_COUNTER}
        if [[ ! -d "${DIR}" ]]; then
            echo "Creating directory ${DIR}" >> $OUTFILE
            mkdir -p ${DIR}
        else
            # Results from previous runs
            if [[ false == "true" ]]; then
                DIR_DST=${WORK_DIR}/old
                echo "Moving results from a previous run to ${DIR_DST}" >> $OUTFILE
                rm -rf ${DIR_DST}/REGION_${LINE_COUNTER}
                mkdir -p ${DIR_DST}
                mv -f ${DIR} ${DIR_DST}
                mkdir -p ${DIR}
            fi
        fi
        # Change to the directory where ShoRAH is to be executed
        if [[ -z "" ]]; then
            cd ${DIR}
        else
            # special case: go in local scratch instead
            rsync -aq "${DIR}" "/"
            mkdir -p "/REGION_${LINE_COUNTER}"
            cd "/REGION_${LINE_COUNTER}"
        fi

        # NOTE: Execution command for ShoRAH2 valid from v1.99.0 and above
        shorah shotgun -t 1 -a 0.1 -w ${WINDOW_LEN} -x 100000  -p 0.9 -c 0 -r ${region} -R 42 -b ${BAM} -f ${REF} >> $OUTFILE 2> >(tee -a $ERRFILE >&2)
        if [[ -n "" ]]; then
            # copyback from localscratch
            rsync -auq "/REGION_${LINE_COUNTER}" "${WORK_DIR}"
            cd ${DIR}
        fi
        if [[ -s ${DIR}/reads.fas && -f ${DIR}/snv/SNVs_0.010000_final.csv ]]; then
            # Non empty reads: run had enough data and should have produced SNVs
            bcftools view ${DIR}/snv/SNVs_0.010000_final.vcf -Oz -o ${DIR}/snv/SNVs_0.010000_final.vcf.gz
            bcftools index ${DIR}/snv/SNVs_0.010000_final.vcf.gz
            FILES+=("${DIR}/snv/SNVs_0.010000_final.csv")
            FILES_VCF+=("${DIR}/snv/SNVs_0.010000_final.vcf.gz")
        elif (( 50 == 0 && LINE_COUNTER == 1 )) && [[ -f ${DIR}/reads.fas && ( ! -s ${DIR}/reads.fas ) ]]; then
            # if we have disabled coverage intervales entirely, the first and only line might have no reads
            # (e.g.: in negative controls )

            echo "No reads while coverage intervals disabled (possible negative control sample)" 2> >(tee -a $ERRFILE >&2)
            cd ${CWD}
            (( --LINE_COUNTER )) || true # Strict mode : (( 0 )) = fail
            break
        else
            echo "ERROR: unsuccesful execution of ShoRAH" 2> >(tee -a $ERRFILE >&2)
            exit 1
        fi

        # Change back to working directory
        cd ${CWD}
    done < results/SRR10903401/20200102/variants/coverage_intervals.tsv

    # Aggregate results from different regions
    if (( ${#FILES[@]} )); then
        echo "Intermediate csv files: ${FILES[*]}" >> results/SRR10903401/20200102/variants/SNVs/shorah.out.log
        echo "Intermediate vcf files: ${FILES_VCF[*]}" >> results/SRR10903401/20200102/variants/SNVs/shorah.out.log
        (head -n 1 "${FILES[0]}"; tail -q -n +2 "${FILES[@]}" | sort -t, -nk2) > results/SRR10903401/20200102/variants/SNVs/snvs.csv
        bcftools concat -o ${WORK_DIR}/snvs_tmp.vcf "${FILES_VCF[@]}"
        bcftools sort ${WORK_DIR}/snvs_tmp.vcf  -o results/SRR10903401/20200102/variants/SNVs/snvs.vcf
        rm -f ${WORK_DIR}/snvs_tmp.vcf
    elif (( LINE_COUNTER )); then
        echo "ERROR: unsuccesful execution of ShoRAH" 2> >(tee -a results/SRR10903401/20200102/variants/SNVs/shorah.err.log >&2)
        exit 1
    else
        echo "No alignment region reports sufficient coverage" >> results/SRR10903401/20200102/variants/SNVs/shorah.out.log
        touch results/SRR10903401/20200102/variants/SNVs/snvs.csv
        touch results/SRR10903401/20200102/variants/SNVs/snvs.vcf
    fi

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

[Tue Jun 13 14:44:29 2023] Error in rule snv: jobid: 18 input: /home/usr1/vpipe_test_dir/V-pipe/workflow/../resources/sars-cov-2/NC_045512.2.fasta, results/SRR10903402/20200102/alignments/REF_aln.bam, results/SRR10903402/20200102/variants/coverage_intervals.tsv output: results/SRR10903402/20200102/variants/SNVs/snvs.csv, results/SRR10903402/20200102/variants/SNVs/snvs.vcf log: results/SRR10903402/20200102/variants/SNVs/shorah.out.log, results/SRR10903402/20200102/variants/SNVs/shorah.err.log (check log file(s) for error details) conda-env: /home/usr1/vpipe_testdir/work/.snakemake/conda/2db3c1462cfa3088aba362563d569612 shell:

    let "WINDOW_SHIFTS=(250 * 4/5 + 3) / 3"
    let "WINDOW_LEN=WINDOW_SHIFTS * 3"

    echo "Windows are shifted by: ${WINDOW_SHIFTS} bp" > results/SRR10903402/20200102/variants/SNVs/shorah.out.log
    echo "The window length is: ${WINDOW_LEN} bp" >> results/SRR10903402/20200102/variants/SNVs/shorah.out.log

    # Get absolute path for input files
    CWD=${PWD}
    BAM=${PWD}/results/SRR10903402/20200102/alignments/REF_aln.bam
    REF=/home/usr1/vpipe_test_dir/V-pipe/workflow/../resources/sars-cov-2/NC_045512.2.fasta; [[ ${REF} =~ ^/ ]] || REF=${PWD}/${REF}
    OUTFILE=${PWD}/results/SRR10903402/20200102/variants/SNVs/shorah.out.log
    ERRFILE=${PWD}/results/SRR10903402/20200102/variants/SNVs/shorah.err.log
    WORK_DIR=${PWD}/results/SRR10903402/20200102/variants/SNVs

    if [[ -n "" ]]; then
        # put input files in localscratch
        rsync -aq "${BAM}" "${REF}" "/"
        BAM="/${BAM##*/}"
        REF="/${REF##*/}"
    fi

    # Run ShoRAH in each of the predetermined regions (regions with sufficient coverage)
    LINE_COUNTER=0
    FILES=( )
    FILES_VCF=( )
    while read -r region || [[ -n ${region} ]]
    do
        echo "Running ShoRAH on region: ${region}" >> $OUTFILE
        (( ++LINE_COUNTER ))
        # Create directory for running ShoRAH in a corresponding region (if doesn't exist)
        DIR=${WORK_DIR}/REGION_${LINE_COUNTER}
        if [[ ! -d "${DIR}" ]]; then
            echo "Creating directory ${DIR}" >> $OUTFILE
            mkdir -p ${DIR}
        else
            # Results from previous runs
            if [[ false == "true" ]]; then
                DIR_DST=${WORK_DIR}/old
                echo "Moving results from a previous run to ${DIR_DST}" >> $OUTFILE
                rm -rf ${DIR_DST}/REGION_${LINE_COUNTER}
                mkdir -p ${DIR_DST}
                mv -f ${DIR} ${DIR_DST}
                mkdir -p ${DIR}
            fi
        fi
        # Change to the directory where ShoRAH is to be executed
        if [[ -z "" ]]; then
            cd ${DIR}
        else
            # special case: go in local scratch instead
            rsync -aq "${DIR}" "/"
            mkdir -p "/REGION_${LINE_COUNTER}"
            cd "/REGION_${LINE_COUNTER}"
        fi

        # NOTE: Execution command for ShoRAH2 valid from v1.99.0 and above
        shorah shotgun -t 1 -a 0.1 -w ${WINDOW_LEN} -x 100000  -p 0.9 -c 0 -r ${region} -R 42 -b ${BAM} -f ${REF} >> $OUTFILE 2> >(tee -a $ERRFILE >&2)
        if [[ -n "" ]]; then
            # copyback from localscratch
            rsync -auq "/REGION_${LINE_COUNTER}" "${WORK_DIR}"
            cd ${DIR}
        fi
        if [[ -s ${DIR}/reads.fas && -f ${DIR}/snv/SNVs_0.010000_final.csv ]]; then
            # Non empty reads: run had enough data and should have produced SNVs
            bcftools view ${DIR}/snv/SNVs_0.010000_final.vcf -Oz -o ${DIR}/snv/SNVs_0.010000_final.vcf.gz
            bcftools index ${DIR}/snv/SNVs_0.010000_final.vcf.gz
            FILES+=("${DIR}/snv/SNVs_0.010000_final.csv")
            FILES_VCF+=("${DIR}/snv/SNVs_0.010000_final.vcf.gz")
        elif (( 50 == 0 && LINE_COUNTER == 1 )) && [[ -f ${DIR}/reads.fas && ( ! -s ${DIR}/reads.fas ) ]]; then
            # if we have disabled coverage intervales entirely, the first and only line might have no reads
            # (e.g.: in negative controls )

            echo "No reads while coverage intervals disabled (possible negative control sample)" 2> >(tee -a $ERRFILE >&2)
            cd ${CWD}
            (( --LINE_COUNTER )) || true # Strict mode : (( 0 )) = fail
            break
        else
            echo "ERROR: unsuccesful execution of ShoRAH" 2> >(tee -a $ERRFILE >&2)
            exit 1
        fi

        # Change back to working directory
        cd ${CWD}
    done < results/SRR10903402/20200102/variants/coverage_intervals.tsv

    # Aggregate results from different regions
    if (( ${#FILES[@]} )); then
        echo "Intermediate csv files: ${FILES[*]}" >> results/SRR10903402/20200102/variants/SNVs/shorah.out.log
        echo "Intermediate vcf files: ${FILES_VCF[*]}" >> results/SRR10903402/20200102/variants/SNVs/shorah.out.log
        (head -n 1 "${FILES[0]}"; tail -q -n +2 "${FILES[@]}" | sort -t, -nk2) > results/SRR10903402/20200102/variants/SNVs/snvs.csv
        bcftools concat -o ${WORK_DIR}/snvs_tmp.vcf "${FILES_VCF[@]}"
        bcftools sort ${WORK_DIR}/snvs_tmp.vcf  -o results/SRR10903402/20200102/variants/SNVs/snvs.vcf
        rm -f ${WORK_DIR}/snvs_tmp.vcf
    elif (( LINE_COUNTER )); then
        echo "ERROR: unsuccesful execution of ShoRAH" 2> >(tee -a results/SRR10903402/20200102/variants/SNVs/shorah.err.log >&2)
        exit 1
    else
        echo "No alignment region reports sufficient coverage" >> results/SRR10903402/20200102/variants/SNVs/shorah.out.log
        touch results/SRR10903402/20200102/variants/SNVs/snvs.csv
        touch results/SRR10903402/20200102/variants/SNVs/snvs.vcf
    fi

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2023-06-13T144427.023150.snakemake.log

DrYak commented 1 year ago

Something is going wrong here.

Could you please provide the file: results/SRR10903401/20200102/variants/coverage_intervals.tsv ?

I will try to reproduce your errors to see where it come from. (Normally SNV should work: their are tested as part of our CI/CD).

GoncheDanesh commented 7 months ago

Hi,

I get the same error when running V-pipe with snv: true in the config file. Did you find why ?

Thank you, Best