cbg-ethz / V-pipe

V-pipe is a pipeline designed for analysing NGS data of short viral genomes
https://cbg-ethz.github.io/V-pipe/
Apache License 2.0
129 stars 43 forks source link

adding viloca to the workflow #150

Closed LaraFuhrmann closed 1 month ago

github-actions[bot] commented 5 months ago

🦙 MegaLinter status: ✅ SUCCESS

Descriptor Linter Files Fixed Errors Elapsed time
✅ BASH shellcheck 11 0 0.51s
✅ DOCKERFILE hadolint 1 0 0.32s
✅ JUPYTER jupyfmt 11 8 0 6.36s
✅ MARKDOWN markdownlint 15 2 0 1.29s
✅ PERL perlcritic 1 0 1.27s
✅ PYTHON black 52 0 0 1.92s
✅ SNAKEMAKE snakefmt 25 0 0 9.83s

See detailed report in MegaLinter reports

_MegaLinter is graciously provided by OX Security_

LaraFuhrmann commented 5 months ago

Tested VILOCA on the hiv-tutorial data CAP188 4 using the following config files to test uniform-tiling strategy and amplicon tiling strategy.

general:
    virus_base_config: ""
    aligner: bwa
    snv_caller: viloca

input:
    reference: "{VPIPE_BASEDIR}/../resources/hiv/HXB2.fasta"
    metainfo_file: "{VPIPE_BASEDIR}/../resources/hiv/metainfo.yaml"
    gff_directory: "{VPIPE_BASEDIR}/../resources/hiv/gffs/"
    # NOTE: this input datadir isn't standard
    datadir: resources/samples/
    read_length: 301
    samples_file: samples.tsv
    paired: true

viloca:
    consensus: false

output:
    snv: true
    local: false
    global: false
    visualization: false
    QA: false
general:
    virus_base_config: ""
    aligner: bwa
    snv_caller: viloca

input:
    reference: "{VPIPE_BASEDIR}/../resources/hiv/HXB2.fasta"
    metainfo_file: "{VPIPE_BASEDIR}/../resources/hiv/metainfo.yaml"
    gff_directory: "{VPIPE_BASEDIR}/../resources/hiv/gffs/"
    # NOTE: this input datadir isn't standard
    datadir: resources/samples/
    read_length: 301
    samples_file: samples.tsv
    paired: true

viloca:
    consensus: false
    insert_bedfile: resources/samples/insert.bed

output:
    snv: true
    local: false
    global: false
    visualization: false
    QA: false
LaraFuhrmann commented 1 month ago

Now we are adding the smallgenomeutilities scritp to merge the paried-end reads into one read. This is since viloca can then be applied to the merged reads with longer local regions / windows.

An example config file could look like this:

general:
    virus_base_config: ""
    aligner: bwa
    snv_caller: viloca

input:
    reference: "{VPIPE_BASEDIR}/../resources/hiv/HXB2.fasta"
    metainfo_file: "{VPIPE_BASEDIR}/../resources/hiv/metainfo.yaml"
    gff_directory: "{VPIPE_BASEDIR}/../resources/hiv/gffs/"
    # NOTE: this input datadir isn't standard
    datadir: resources/samples/
    read_length: 301
    samples_file: samples.tsv
    paired: true

viloca:
    merge_paired_end_reads: true    
    consensus: false

output:
    snv: true
    local: false
    global: false
    visualization: false
    QA: false
LaraFuhrmann commented 1 month ago

We are also adding the paired_end_read_merger from smallgenomeutilities to the workflow.

The workflow as tested with a ww SARS-CoV-2 sample using the following config file: (note we cheated by providing already an aligned bam file to V-pipe here, but the paired-end-read-merger and viloca were executed.)

general:
    virus_base_config: ""
    aligner: bwa
    snv_caller: viloca

input:
    reference: "{VPIPE_BASEDIR}/../resources/hiv/HXB2.fasta"
    metainfo_file: "{VPIPE_BASEDIR}/../resources/hiv/metainfo.yaml"
    gff_directory: "{VPIPE_BASEDIR}/../resources/hiv/gffs/"
    # NOTE: this input datadir isn't standard
    datadir: resources/samples/
    read_length: 301
    samples_file: samples.tsv
    paired: true

applications:
    paired_end_read_merger: /Users/lfuhrmann/Downloads/smallgenomeutilities-dev/scripts/paired_end_read_merger

viloca:
    merge_paired_end_reads: true
    consensus: false

output:
    snv: true
    local: false
    global: false
    visualization: false
    QA: false

There is a bug in the paired_end_read_merger as VILOCA is reporting ValueError: Soft clipping only possible on the edges of a read.

LaraFuhrmann commented 1 month ago

For completeness, VILOCA also runs when we don't choose to the read merging.

general:
    virus_base_config: ""
    aligner: bwa
    snv_caller: viloca

input:
    reference: "{VPIPE_BASEDIR}/../resources/hiv/HXB2.fasta"
    metainfo_file: "{VPIPE_BASEDIR}/../resources/hiv/metainfo.yaml"
    gff_directory: "{VPIPE_BASEDIR}/../resources/hiv/gffs/"
    # NOTE: this input datadir isn't standard
    datadir: resources/samples/
    read_length: 301
    samples_file: samples.tsv
    paired: true

applications:
    paired_end_read_merger: /Users/lfuhrmann/Downloads/smallgenomeutilities-dev/scripts/paired_end_read_merger

viloca:
    merge_paired_end_reads: false
    consensus: false

output:
    snv: true
    local: false
    global: false
    visualization: false
    QA: false