W-L / ProblematicSites_SARS-CoV2

48 stars 15 forks source link

site 21987 #7

Closed hsnguyen closed 2 years ago

hsnguyen commented 3 years ago

This is from one of the defining site for delta (G142D) but ARTIC v3 seems to have some issues calling this SNP. It'd be great if you can look into this to see if we can add it to the VCF file. Thanks

charlesfoster commented 3 years ago

+1

We find that position 21987 becomes problematic when constructing a tree of Delta variant samples. A mutation of G>A at 21987 is supposedly defining for the Delta variant, but those labs using the ARTIC v3 primers can often have the reference 'G' allele in consensus genomes, where as labs using other primer sets (e.g., the 'Midnight' 1200 base amplicons) call the variant 'A'. Our working theory is that either the artic minion pipeline does not properly clip the artic 73_LEFT primer, or that there is a dropout region in artic amplicon 72 which causes the defining SNP to be lost.

Since the variant in question doesn't fall in the primer binding region of other popular primer sets, the same problem is not seen with those primers:

ARTIC v3
MN908947.3  21961   21990   nCoV-2019_73_LEFT
MN908947.3  22324   22346   nCoV-2019_73_RIGHT
ARTIC v4
MN908947.3  21865   21889   SARS-CoV-2_73_LEFT
MN908947.3  22247   22274   SARS-CoV-2_73_RIGHT
Midnight
MN908947.3  21532   21562   nCoV-2019_22_LEFT
MN908947.3  22590   22612   nCoV-2019_22_RIGHT
JS Eden
MN908947.3  21357   21386   nCoV-2019_11_LEFT
MN908947.3  23822   23847   nCoV-2019_11_RIGHT

Given the enduring popularity of the ARTIC v3 primers, it seems prudent to add position 21987 to the problematic sites list. If the position is left unmasked, we see artificial clustering in phylogenetic trees that has the potential to mislead phylogenetic or genomic epidemiological inference. We have seen this problem in Australia, but others have also seen the problem overseas. For example, the Pango team masks it in some trees: https://github.com/cov-lineages/pango-designation/issues/95

Thanks!

conorwalker commented 3 years ago

Hi @hsnguyen and @charlesfoster - thanks for letting us know, I've now added this site to the VCF(s) with a mask recommendation. I've listed you both as the submitters of this position, hope that's OK!

charlesfoster commented 3 years ago

Great, thanks @conorwalker!

andersgs commented 3 years ago

I am linking two issues from cov-lineages that seem relevant:

https://github.com/cov-lineages/pango-designation/issues/117#issue-925220359

https://github.com/cov-lineages/pango-designation/issues/134#issuecomment-879240791

Pointed out to me by @AngieHinrichs