cbg-ethz / V-pipe

V-pipe is a pipeline designed for analysing NGS data of short viral genomes
https://cbg-ethz.github.io/V-pipe/
Apache License 2.0
132 stars 46 forks source link

How to run analysis for single-end reads #61

Open zuber-bioinfo opened 4 years ago

zuber-bioinfo commented 4 years ago

I have successfully run test data and Wuhan data on which is paired-end data, but not able to run single-end data as there is no specific guide/manual for it. I have got to know that it supports single-end data as reported in publication. I tried by renaming file to read_R1.fastq but no sucess,

(base) zuber@gbrc-hpc-42:/opt/data/env-V-pipe/ENV/work$ ./vpipe --cores 40 VPIPE_BASEDIR = /opt/data/env-V-pipe/ENV/V-pipe AssertionError in line 369 of /opt/data/env-V-pipe/ENV/V-pipe/rules/common.smk: ERROR: Line '3' does not contain at least two entries! File "/opt/data/env-V-pipe/ENV/V-pipe/vpipe.snake", line 11, in File "/opt/data/env-V-pipe/ENV/V-pipe/rules/common.smk", line 369, in

DrYak commented 4 years ago

Hello !

Sorry, I should have been a little bit more explicit in the tutorial & webinar.

Paired ends vs Single end is controller with the paired option in the input section of the configuration file, e.g.:

[input]
paired=false
zuber-bioinfo commented 4 years ago

Thank you very much, it started now..

isara88 commented 3 months ago

Hello, I also have a problem running the analysis for single reads and adding "paired=false" in the input field did not solve it for me. I successfully run the sars-cov-2 tutorial but when I remove the files SRR10903401_R2.fastq and SRR10903402_R2.fastq and add "paired=false" in the config file I get the following error:

... Input and filter stats: Input sequences: 476,632 Input bases: 71,758,102 Input mean length: 150.55 Good sequences: 0 (0.00%) Bad sequences: 476,632 (100.00%) Bad bases: 71,758,102 Bad mean length: 150.55 Sequences filtered by specified parameters: trim_qual_left: 78998 min_len: 397634 Waiting for unlocks samples/SRR10903401/20200102/preprocessed_data/R_1.fastq unlocked samples/SRR10903401/20200102/preprocessed_data/R_2.fastq unlocked mv: cannot stat 'samples/SRR10903401/20200102/preprocessed_data/R.fastq': No such file or directory [Wed Jul 31 12:48:01 2024] Error in rule preprocessing_se: jobid: 0 input: samples/SRR10903401/20200102/extracted_data/R1.fastq output: samples/SRR10903401/20200102/preprocessed_data/R1.fastq.gz log: samples/SRR10903401/20200102/preprocessed_data/prinseq.out.log, samples/SRR10903401/20200102/preprocessed_data/prinseq.err.log (check log file(s) for error details) conda-env: /home///*/Vpipe_runs/covidtutorial2/.snakemake/conda/a406b36923bbd24b06c0c04b02ca5899 shell:

        echo "The length cutoff is: 200" > samples/SRR10903401/20200102/preprocessed_data/prinseq.out.log

        prinseq-lite.pl -fastq samples/SRR10903401/20200102/extracted_data/R1.fastq -ns_max_n 4 -min_qual_mean 30 -trim_qual_left 30 -trim_qual_right 30 -trim_qual_window 10 -out_format 3 -out_good samples/SRR10903401/20200102/preprocessed_data/R -out_bad null -min_len 200 -ns_max_n 4 -min_qual_mean 30 -trim_qual_left 30 -trim_qual_right 30 -trim_qual_window 10 -log samples/SRR10903401/20200102/preprocessed_data/prinseq.out.log 2> >(tee samples/SRR10903401/20200102/preprocessed_data/prinseq.err.log >&2)

        # make sure that the lock held prinseq has been effectively released and propagated
        # on some network shares this could otherwise lead to confusion or corruption
        if [[ "$OSTYPE" =~ ^linux ]]; then
            echo "Waiting for unlocks" >&2
            for U in samples/SRR10903401/20200102/preprocessed_data/R_{1,2}.fastq; do
                flock -x -o ${U} -c "echo ${U} unlocked >&2"
            done
        fi

        mv samples/SRR10903401/20200102/preprocessed_data/R{,1}.fastq

        gzip samples/SRR10903401/20200102/preprocessed_data/R1.fastq

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Use of uninitialized value $fhmappings in unlink at /home////Vpipe_runs/covidtutorial2/.snakemake/conda/a406b36923bbd24b06c0c04b02ca5899/bin/prinseq-lite.pl line 1833. Input and filter stats: Input sequences: 676,694 Input bases: 101,889,418 Input mean length: 150.57 Good sequences: 0 (0.00%) Bad sequences: 676,694 (100.00%) Bad bases: 101,889,418 Bad mean length: 150.57 Sequences filtered by specified parameters: trim_qual_left: 92421 min_len: 584273 Waiting for unlocks samples/SRR10903402/20200102/preprocessed_data/R_1.fastq unlocked samples/SRR10903402/20200102/preprocessed_data/R_2.fastq unlocked mv: cannot stat 'samples/SRR10903402/20200102/preprocessed_data/R.fastq': No such file or directory [Wed Jul 31 12:48:10 2024] Error in rule preprocessing_se: jobid: 0 input: samples/SRR10903402/20200102/extracted_data/R1.fastq output: samples/SRR10903402/20200102/preprocessed_data/R1.fastq.gz log: samples/SRR10903402/20200102/preprocessed_data/prinseq.out.log, samples/SRR10903402/20200102/preprocessed_data/prinseq.err.log (check log file(s) for error details) conda-env: /home////Vpipe_runs/covidtutorial2/.snakemake/conda/a406b36923bbd24b06c0c04b02ca5899 shell:

        echo "The length cutoff is: 200" > samples/SRR10903402/20200102/preprocessed_data/prinseq.out.log

        prinseq-lite.pl -fastq samples/SRR10903402/20200102/extracted_data/R1.fastq -ns_max_n 4 -min_qual_mean 30 -trim_qual_left 30 -trim_qual_right 30 -trim_qual_window 10 -out_format 3 -out_good samples/SRR10903402/20200102/preprocessed_data/R -out_bad null -min_len 200 -ns_max_n 4 -min_qual_mean 30 -trim_qual_left 30 -trim_qual_right 30 -trim_qual_window 10 -log samples/SRR10903402/20200102/preprocessed_data/prinseq.out.log 2> >(tee samples/SRR10903402/20200102/preprocessed_data/prinseq.err.log >&2)

        # make sure that the lock held prinseq has been effectively released and propagated
        # on some network shares this could otherwise lead to confusion or corruption
        if [[ "$OSTYPE" =~ ^linux ]]; then
            echo "Waiting for unlocks" >&2
            for U in samples/SRR10903402/20200102/preprocessed_data/R_{1,2}.fastq; do
                flock -x -o ${U} -c "echo ${U} unlocked >&2"
            done
        fi

        mv samples/SRR10903402/20200102/preprocessed_data/R{,1}.fastq

        gzip samples/SRR10903402/20200102/preprocessed_data/R1.fastq

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2024-07-31T124530.578375.snakemake.log