BU-ISCIII / viralrecon

Assembly and intrahost/low-frequency variant calling for viral samples
https://nf-co.re/viralrecon
MIT License
0 stars 5 forks source link

Create a issue in ivar - indels #16

Closed Alema91 closed 2 years ago

Alema91 commented 2 years ago

Create a issue to examine the ivar performance in indels

Alema91 commented 2 years ago

Describe the bug

We have observed that ivar variants can generate false positive variant calls for SARS-CoV-2 genomes that contain insertions or deletions. Here is an example from a private genome that contains the 6bp ORF8 deletion:

CHROM POS REF ALT GENE EFFECT HGVS_C HGVS_P DP REF_DP ALT_DP AF sample software lineage
NC_045512.2 28247 AGATTTC A ORF8 conservative_inframe_deletion c.355_360delGATTTC p.Asp119_Phe120del 76166 48839 64275 0.84 218025 ivar AY.33

Position 28247 has a well supported deletion (nearly 70000x coverage) and ivar variants is calling a variant inside that deletion (IGV image and variant table):

issue_img1

CHROM POS REF ALT GENE EFFECT HGVS_C HGVS_P DP REF_DP ALT_DP AF sample software lineage
NC_045512.2 28253 C A ORF8 missense_variant c.360C>A p.Phe120Leu 3954 61 3851 0.97 218025 ivar AY.33

This variant should be included in the consensus according to our quality criteria (variants with an AF > 0.75). Therefore, the AF is overestimated due to the misscalculation in the variant depth (because of the deletion).

CHROM POS REF ALT GENE EFFECT HGVS_C HGVS_P DP REF_DP ALT_DP AF sample software lineage
NC_045512.2 28247 AGATTTC A ORF8 conservative_inframe_deletion c.355_360delGATTTC p.Asp119_Phe120del 75039 10741 63575 0.847 218025 VarScan AY.33
NC_045512.2 28248 GATTTCA G ORF8 disruptive_inframe_deletion c.356_361delATTTCA p.Asp119_Ile121delinsVal 64673 126 189 0.003 218025 VarScan AY.33
NC_045512.2 28249 ATTTC A ORF8 frameshift_variant c.357_360delTTTC p.Asp119fs 64614 98 79 0.001 218025 VarScan AY.33
NC_045512.2 28251 TTCATC T ORF8 frameshift_variant&stop_lost&splice_region_variant c.360_364delCATCT p.Phe120fs 69573 5295 48 0.001 218025 VarScan AY.33
NC_045512.2 28252 TC T ORF8 frameshift_variant c.360delC p.Phe120fs 69316 4895 151 0.002 218025 VarScan AY.33
NC_045512.2 28253 C A ORF8 missense_variant c.360C>A p.Phe120Leu 69561 15 5009 0.072 218025 VarScan AY.33

Other variant callers such us Varscan detect this variant with an AF << 0.25 because the depth of that position is calculated taking into account the deletion reads. Thus, the AF differs from ivar variants.

Expected behavior

We might suggest that ivar variants overestimate this variant based on the depth calculation and therefore can cause issues with variant prediction in indels (insertions and deletions).

This issue may be related to #79, #83, #85, #103

To Reproduce

Run ivar variants with these params:

samtools mpileup \\
        -a \\
        --count-orphans \\
        --no-BAQ \\
        --ignore-overlaps \\
        --max-depth 20 \\
        --fasta-ref fasta \\
        --min-BQ  | ivar variants -q 30 -t 0.25 -m 10 -r fasta gff -p sample
Alema91 commented 2 years ago

Better comprehension and need data to reproduce the issue.

Alema91 commented 2 years ago

Done #126