connor-lab / ncov2019-artic-nf

A Nextflow pipeline for running the ARTIC network's fieldbioinformatics tools (https://github.com/artic-network/fieldbioinformatics), with a focus on ncov2019
GNU Affero General Public License v3.0
88 stars 89 forks source link

variants with low SupportFraction change the sequence #103

Open paraslonic opened 3 years ago

paraslonic commented 3 years ago

Hi! Working with nanopore data. Found that a low frequency indel changes the final sequence, creating a fremshift

MN908947.3 24776 . G GA 1042.0 PASS TotalReads=134;SupportFraction=0.560602;SupportFractionByStrand=0.534448,0.610637;BaseCalledReadsWithVariant=29;BaseCalledFraction=0.180124;AlleleCount=1;StrandSupport=41,18,47,28;StrandFisherTest=3;SOR=0.440537;RefContext=CACAAGAAAAG;Pool=nCoV-2019_2 GT 1

This extra A is inserted into final sequence, making it: CTGCAGAAGAAAAAGAA and leading to a frameshift in spike protein

IGV shows that this is not the confident variant but rather a sequence error image

How can I make consensus more correct? Best regards, Aleks Manolov