cov-lineages / pango-designation

Repository for suggesting new lineages that should be added to the current scheme
Other
1.04k stars 98 forks source link

Australian lineage is systematically misclassified as AY.30 #234

Closed corneliusroemer closed 3 years ago

corneliusroemer commented 3 years ago

When studying a recent global tree, I noticed that current designations for AY.30 actually seem to comprise two monophyletic lineages that are very far apart within Delta. One cluster (the correct AY.30) is in the smaller bottom branch of Delta (Thailand etc.), the other one is in the upper part and mostly sequences from Australia.

image

I don't know how it can happen that the Australia lineage is so reliably misclassified as AY.30, maybe there was/is pollution in the designated sequences?

This is an usher tree with a subsample of sequences classified as AY.30. You can nicely see the two-way-split. image

Here you can see sample Usher and pango pango designations: image image

I'd recommend adding these strains to B.1.617.2 or maybe even designate it as an (Australian) lineage of its own to prevent this misclassification from happening

Here is a sample of AY30 assignments split by whether they are actually AY.30 or not: AY30.txt non-AY30.txt

Usher link: https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/singleSubtreeAuspice_genome_1ff47_b78de0.json?label=nuc%20mutations:C9319T,A13482G,G28690T,G29781T

chrisruis commented 3 years ago

Thanks @corneliusroemer It looks like AY.30 and this clade have convergently acquired A13482G which is resulting in the mis-assignment of sequences in this clade to AY.30.

Looking at the clade and the upstream branches, it looks like this region of the tree does warrant new lineages. Upstream of this Australia clade, there's an introduction(s) into the USA with onward transmission that we've designated AY.39. It's difficult to identify exactly where the introduction(s) into the USA occurred in the tree so we've started this lineage on a branch with G27604A which corresponds to a clear USA clade. AY.39 has 2301 sequence designations.

Within AY.39, there's an introduction(s) into Australia with onward transmission that we've designated AY.39.1. We've started this lineage on a branch with A13482G which is synonymous. This is the clade that has convergently acquired this mutation with AY.30. As the clades are quite distant there should be sufficient other mutations between them to enable them to be assigned correctly. AY.39.1 has 7453 sequence designations.

Within AY.39.1, there's a clade of New Zealand sequences that we've designated AY.39.1.1. We've started this lineage on G19563T which corresponds to Orf1ab:L6433F. AY.39.1.1 has 124 sequence designations.

These changes are in v1.2.84.