Closed FedeGueli closed 1 year ago
@thomasppeacock @InfrPopGen @corneliusroemer this new chinese lineage will be tracked here being #1565 just a sublineage of this one although big.
Recommending this to keep on top of lineage diversity in China, thanks Fede!
Thanks for submitting. We've added lineage BA.5.2.50 with 4 newly designated sequences, and 0 updated. Defining mutations C884T (ORF1a:R207C), C7728T (ORF1a:S2488F) (following C18647T (ORF1b:P1727L)).
thank you @InfrPopGen and @thomasppeacock !
Seems that Usher is not categorizing this branch correctly now and is categorizing another branch as BA.5.2.50.
There're a lot of sequences without orf1a:R207C on this branch. Previously those seqs are regarded as "C207R reversions", and now the algorithm decides to flip the order, and therefore this branch is being "thrown out" of BA.5.2.50.
Yep, @aviczhl2 you are right, thanks for pointing this out! This is a problem with Usher rather than pango-designations, pinging @AngieHinrichs
The defining mutations of BA.5.2.50 above BA.5.2 are:
164 "C884T",
165 "C7728T",
166 "G12310A",
167 "C16616A",
168 "C18647T",
169 "C27012T",
170 "C27513T"
There seems to be dropout of ORF1ab:R207C which cause the top bit with that mutation to appear as non-BA.5.2.50.
I've added the USHeR not clean
label, if you spot similar issues with other lineages let us know. Good spot :) @aviczhl2
How important is ORF1a:R207C to the definition of BA.5.2.50? If we omit ORF1a:R207C (C884T) from the definition of BA.5.2.50, and instead define it as
... ORF1b:P1727L (C18647T) > ORF1a:S2488F (C7728T)
that would include both branches. Here is a taxonium view of the branch at ORF1a:S2488F (C7728T) that currently has both the annotated BA.5.2.50 and the other branch that gets ORF1a:R207C (C884T) after several other mutations, with red circles around the four designated BA.5.2.50 sequences in lineages.csv and nodes colored by allele at 884 (orange=C, green=T):
I can't blame UShER for structuring it that way because the non-Shandong sequences seem to get C884T right after C7728T, while the Shandong sequences after C7728T include sequences without C884T but with
and then the Shandong sequences that do have C884T also have all three of the above mutations. The only way to force UShER to make C884T come first would be to permanently exclude the 18 Shandong sequences that don't have C884T. Do we want to do that? Alternatively I could add a "BA.5.2.50_alt" label to the Shandong C884T branch so that both branches would be included in the minimized tree for pangolin and would result in BA.5.2.50 being assigned.
... or, again, we could simply omit ORF1a:R207C (C884T) from the definition and let BA.5.2.50 start at ORF1a:S2488F (C7728T).
... or, again, we could simply omit ORF1a:R207C (C884T) from the definition and let BA.5.2.50 start at ORF1a:S2488F (C7728T).
Please fix this way? @AngieHinrichs I see orf1a:R207C has been removed from defining mutation of BA.5.2.50 on lineage_note.txt
BA.5.2.50 Alias of B.1.1.529.5.2.50 China, ORF1a:S2488F after ORF1b:P1727L, issue #1542
BA.5.2.50 Alias of B.1.1.529.5.2.50 China, ORF1a:S2488F after ORF1b:P1727L, issue #1542
Good point @aviczhl2! And I was about to ask Cornelius if he's OK with the change but he added more designated sequences in 3e600c1e to make it extra clear. OK, I will fix it.
Yes sorry for not reporting back here - I agree that it makes sense to remove that one mutation. I'm not sure I trust the Shandong sequences but in the end it's not a very important change.
EDITED
I will competely rewrite this issue after the mass upload of sequences from Shandong last night. Initially i proposed it separately in #1565 but then i realize thast this has the same root of the little one proposed here.
Defining mutations: BA.5.2 > C27513T > C27012T > G12310A > Orf1b:T1050N (C16616A) > C18647T > Orf1a:R207C (C884T), Orf1a:S2488F (C7728T)
Gisaid query/covspectrum query: C7728T, C18647T, C16616A
Tree:
https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_2d1fe_ba22c0.json?c=userOrOld&label=id:node_7413453
Sequences: EPI_ISL_16390376, EPI_ISL_16458311, EPI_ISL_16458319, EPI_ISL_16494609, EPI_ISL_16576905, EPI_ISL_16604347, EPI_ISL_16604351-16604353, EPI_ISL_16604355-16604357, EPI_ISL_16604361-16604364, EPI_ISL_16604366-16604370, EPI_ISL_16604372-16604373, EPI_ISL_16604375, EPI_ISL_16604378-16604381, EPI_ISL_16604383-16604388, EPI_ISL_16604390-16604391, EPI_ISL_16604393, EPI_ISL_16604412-16604413, EPI_ISL_16604416, EPI_ISL_16604418, EPI_ISL_16604424, EPI_ISL_16604429, EPI_ISL_16604453, EPI_ISL_16604467, EPI_ISL_16604474, EPI_ISL_16604483, EPI_ISL_16604485, EPI_ISL_16604526, EPI_ISL_16604545, EPI_ISL_16604553, EPI_ISL_16604555, EPI_ISL_16604563, EPI_ISL_16604565-16604566, EPI_ISL_16604576, EPI_ISL_16604579, EPI_ISL_16604612, EPI_ISL_16604620, EPI_ISL_16604624, EPI_ISL_16604647-16604648, EPI_ISL_16604679, EPI_ISL_16604691, EPI_ISL_16604736, EPI_ISL_16604767-16604768, EPI_ISL_16604770-16604771, EPI_ISL_16604774, EPI_ISL_16604780-16604785, EPI_ISL_16604789, EPI_ISL_16604885, EPI_ISL_16604889, EPI_ISL_16604894