cov-lineages / pango-designation

Repository for suggesting new lineages that should be added to the current scheme
Other
1.04k stars 97 forks source link

BA.5.2 + orf1b:1050N sublineage with Orf1a:T3284I,then orf1a:R207C, orf1a:S2488F and Orf1b:P1727L circulating in Shandong (78 seq with Orf1b:S2339F) and Gansu,Hunan (3 seqs with Orf1a:T3284I) #1542

Closed FedeGueli closed 1 year ago

FedeGueli commented 1 year ago

EDITED

I will competely rewrite this issue after the mass upload of sequences from Shandong last night. Initially i proposed it separately in #1565 but then i realize thast this has the same root of the little one proposed here.

Defining mutations: BA.5.2 > C27513T > C27012T > G12310A > Orf1b:T1050N (C16616A) > C18647T > Orf1a:R207C (C884T), Orf1a:S2488F (C7728T)

Gisaid query/covspectrum query: C7728T, C18647T, C16616A

Tree:

Schermata 2023-01-21 alle 09 40 44 Schermata 2023-01-21 alle 09 41 21

https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_2d1fe_ba22c0.json?c=userOrOld&label=id:node_7413453

Sequences: EPI_ISL_16390376, EPI_ISL_16458311, EPI_ISL_16458319, EPI_ISL_16494609, EPI_ISL_16576905, EPI_ISL_16604347, EPI_ISL_16604351-16604353, EPI_ISL_16604355-16604357, EPI_ISL_16604361-16604364, EPI_ISL_16604366-16604370, EPI_ISL_16604372-16604373, EPI_ISL_16604375, EPI_ISL_16604378-16604381, EPI_ISL_16604383-16604388, EPI_ISL_16604390-16604391, EPI_ISL_16604393, EPI_ISL_16604412-16604413, EPI_ISL_16604416, EPI_ISL_16604418, EPI_ISL_16604424, EPI_ISL_16604429, EPI_ISL_16604453, EPI_ISL_16604467, EPI_ISL_16604474, EPI_ISL_16604483, EPI_ISL_16604485, EPI_ISL_16604526, EPI_ISL_16604545, EPI_ISL_16604553, EPI_ISL_16604555, EPI_ISL_16604563, EPI_ISL_16604565-16604566, EPI_ISL_16604576, EPI_ISL_16604579, EPI_ISL_16604612, EPI_ISL_16604620, EPI_ISL_16604624, EPI_ISL_16604647-16604648, EPI_ISL_16604679, EPI_ISL_16604691, EPI_ISL_16604736, EPI_ISL_16604767-16604768, EPI_ISL_16604770-16604771, EPI_ISL_16604774, EPI_ISL_16604780-16604785, EPI_ISL_16604789, EPI_ISL_16604885, EPI_ISL_16604889, EPI_ISL_16604894

FedeGueli commented 1 year ago

@thomasppeacock @InfrPopGen @corneliusroemer this new chinese lineage will be tracked here being #1565 just a sublineage of this one although big.

thomasppeacock commented 1 year ago

Recommending this to keep on top of lineage diversity in China, thanks Fede!

InfrPopGen commented 1 year ago

Thanks for submitting. We've added lineage BA.5.2.50 with 4 newly designated sequences, and 0 updated. Defining mutations C884T (ORF1a:R207C), C7728T (ORF1a:S2488F) (following C18647T (ORF1b:P1727L)).

FedeGueli commented 1 year ago

thank you @InfrPopGen and @thomasppeacock !

aviczhl2 commented 1 year ago
Screen Shot 2023-02-08 at 10 26 46

https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice6_genome_2967e_2e7340.json?label=id:node_7556982

Seems that Usher is not categorizing this branch correctly now and is categorizing another branch as BA.5.2.50.

There're a lot of sequences without orf1a:R207C on this branch. Previously those seqs are regarded as "C207R reversions", and now the algorithm decides to flip the order, and therefore this branch is being "thrown out" of BA.5.2.50.

corneliusroemer commented 1 year ago

Yep, @aviczhl2 you are right, thanks for pointing this out! This is a problem with Usher rather than pango-designations, pinging @AngieHinrichs

The defining mutations of BA.5.2.50 above BA.5.2 are:

    164     "C884T",
    165     "C7728T",
    166     "G12310A",
    167     "C16616A",
    168     "C18647T",
    169     "C27012T",
    170     "C27513T"

There seems to be dropout of ORF1ab:R207C which cause the top bit with that mutation to appear as non-BA.5.2.50.

corneliusroemer commented 1 year ago

I've added the USHeR not clean label, if you spot similar issues with other lineages let us know. Good spot :) @aviczhl2

AngieHinrichs commented 1 year ago

How important is ORF1a:R207C to the definition of BA.5.2.50? If we omit ORF1a:R207C (C884T) from the definition of BA.5.2.50, and instead define it as

... ORF1b:P1727L (C18647T) > ORF1a:S2488F (C7728T)

that would include both branches. Here is a taxonium view of the branch at ORF1a:S2488F (C7728T) that currently has both the annotated BA.5.2.50 and the other branch that gets ORF1a:R207C (C884T) after several other mutations, with red circles around the four designated BA.5.2.50 sequences in lineages.csv and nodes colored by allele at 884 (orange=C, green=T):

image

I can't blame UShER for structuring it that way because the non-Shandong sequences seem to get C884T right after C7728T, while the Shandong sequences after C7728T include sequences without C884T but with

and then the Shandong sequences that do have C884T also have all three of the above mutations. The only way to force UShER to make C884T come first would be to permanently exclude the 18 Shandong sequences that don't have C884T. Do we want to do that? Alternatively I could add a "BA.5.2.50_alt" label to the Shandong C884T branch so that both branches would be included in the minimized tree for pangolin and would result in BA.5.2.50 being assigned.

... or, again, we could simply omit ORF1a:R207C (C884T) from the definition and let BA.5.2.50 start at ORF1a:S2488F (C7728T).

aviczhl2 commented 1 year ago

... or, again, we could simply omit ORF1a:R207C (C884T) from the definition and let BA.5.2.50 start at ORF1a:S2488F (C7728T).

Please fix this way? @AngieHinrichs I see orf1a:R207C has been removed from defining mutation of BA.5.2.50 on lineage_note.txt

BA.5.2.50 Alias of B.1.1.529.5.2.50 China, ORF1a:S2488F after ORF1b:P1727L, issue #1542

AngieHinrichs commented 1 year ago

BA.5.2.50 Alias of B.1.1.529.5.2.50 China, ORF1a:S2488F after ORF1b:P1727L, issue #1542

Good point @aviczhl2! And I was about to ask Cornelius if he's OK with the change but he added more designated sequences in 3e600c1e to make it extra clear. OK, I will fix it.

corneliusroemer commented 1 year ago

Yes sorry for not reporting back here - I agree that it makes sense to remove that one mutation. I'm not sure I trust the Shandong sequences but in the end it's not a very important change.