Closed corneliusroemer closed 3 years ago
Thanks @corneliusroemer It looks like AY.30 and this clade have convergently acquired A13482G which is resulting in the mis-assignment of sequences in this clade to AY.30.
Looking at the clade and the upstream branches, it looks like this region of the tree does warrant new lineages. Upstream of this Australia clade, there's an introduction(s) into the USA with onward transmission that we've designated AY.39. It's difficult to identify exactly where the introduction(s) into the USA occurred in the tree so we've started this lineage on a branch with G27604A which corresponds to a clear USA clade. AY.39 has 2301 sequence designations.
Within AY.39, there's an introduction(s) into Australia with onward transmission that we've designated AY.39.1. We've started this lineage on a branch with A13482G which is synonymous. This is the clade that has convergently acquired this mutation with AY.30. As the clades are quite distant there should be sufficient other mutations between them to enable them to be assigned correctly. AY.39.1 has 7453 sequence designations.
Within AY.39.1, there's a clade of New Zealand sequences that we've designated AY.39.1.1. We've started this lineage on G19563T which corresponds to Orf1ab:L6433F. AY.39.1.1 has 124 sequence designations.
These changes are in v1.2.84.
When studying a recent global tree, I noticed that current designations for AY.30 actually seem to comprise two monophyletic lineages that are very far apart within Delta. One cluster (the correct AY.30) is in the smaller bottom branch of Delta (Thailand etc.), the other one is in the upper part and mostly sequences from Australia.
I don't know how it can happen that the Australia lineage is so reliably misclassified as AY.30, maybe there was/is pollution in the designated sequences?
This is an usher tree with a subsample of sequences classified as AY.30. You can nicely see the two-way-split.
Here you can see sample Usher and pango pango designations:
I'd recommend adding these strains to B.1.617.2 or maybe even designate it as an (Australian) lineage of its own to prevent this misclassification from happening
Here is a sample of AY30 assignments split by whether they are actually AY.30 or not: AY30.txt non-AY30.txt
Usher link: https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/singleSubtreeAuspice_genome_1ff47_b78de0.json?label=nuc%20mutations:C9319T,A13482G,G28690T,G29781T