cov-lineages / pango-designation

Repository for suggesting new lineages that should be added to the current scheme
Other
1.04k stars 97 forks source link

AY.3 appearing in COG-UK lineage calls without ORF1ab:I3731V for PLEARN-v1.2.36 #142

Closed zach-hensel closed 3 years ago

zach-hensel commented 3 years ago

Blatantly copied title & text from #140

wget https://cog-uk.s3.climb.ac.uk/phylogenetics/latest/cog_metadata.csv
grep AY.3 cog_metadata.csv | wc -l
212
grep AY.3 cog_metadata.csv | grep I3731V | wc -l
1
zach-hensel commented 3 years ago

Also possibly some not being called looking at this UShER tree - https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_2cea9_cccf80.json?c=pango_lineage&label=nuc%20mutations:A11456G,G29050A

I think this would be more clear if built masking G142D.

image

theosanderson commented 3 years ago

Also possibly some not being called looking at this UShER tree -

Pangolin gets run on a slightly ad-hoc way in the UShER metadata ~so I think you may be seeing old samples here that haven't been re-run since AY.3 was designated. (But I could be wrong).~

Ed: sense checked some seqs and did get B.1.617.2 from Pangolin GUI

AngieHinrichs commented 3 years ago

You're not wrong, Theo! I do need to re-run pangolin/pangoLEARN on all sequences for the UShER/hgPhyloPlace metadata since the recent pangoLEARN and pangolin releases. But thanks for pointing out that there are multiple possible causes. :)

zach-hensel commented 3 years ago

Thank you both for the background info -- I 100% have no idea how this works which is why I copied the other post. I saw Japan was tracking AY.3 already and since it's not clear there's much if any transmission advantage (still watching fraction of Delta sequences in Missouri and Kansas) there's maybe a small risk in misclassifications having consequences a la UK-Portugal and AY.1 leading to a week of noise when we maybe should've focused on the signal. Also maybe of note for AY.1 and K417N in Delta is @insapathogenomics latest report: https://insaflu.insa.pt/covid19/relatorios/INSA_SARS_CoV_2_DIVERSIDADE_GENETICA_relatorio_situacao_2021-07-13.pdf

phwanka commented 3 years ago

On GISAID, there are currenlty (2021/08/04) still 11,088 samples without ORF1ab:I3731V assigned to AY.3, while only 3,783 samples assigned to AY.3 actually contain the mutation.
AY 3_wo_I3731V AY 3_w_I3731V

chrisruis commented 3 years ago

Thanks @zach-hensel and @phwanka. It looks like there's another B.1.617.2 subclade that has convergently acquired one of the mutations that was acquired leading to AY.3 which is potentially what has resulted in these sequences being assigned AY.3 despite clustering elsewhere. We're currently working on designating lineages within B.1.617.2 which should hopefully resolve this. We'll hopefully post an update soon

zach-hensel commented 3 years ago

If this is possible, a patch to change AY.3 calls to B.1.617.2 calls if missing A11456G (NSP6_I162V, ORF1a:I3731V) mainly fixes the issue for now.

Honestly it's not so bad that a lot of transiently miscalled AY.3 led to people spend a week question why lineage counts can change before jumping to conclusions... this is one of the easier reasons to explain.

chrisruis commented 3 years ago

Thanks again @zach-hensel. Most of these sequences are now in AY.11 in v1.2.56