cov-lineages / pango-designation

Repository for suggesting new lineages that should be added to the current scheme
Other
1.04k stars 98 forks source link

XBB.1.16 Sublineage with S:P521T, ORF3a:I35K (82 seq, 16 countries, Jun 16) #2058

Closed ryhisner closed 1 year ago

ryhisner commented 1 year ago

Description

Sub-lineage of: XBB.1.16 Earliest sequence: 2023-3-23, India, Delhi — EPI_ISL_17466951 Most recent sequence: 2023-6-5, Australia, Victoria — EPI_ISL_17803195 Countries circulating: India (23), Australia (21), USA (9), Thailand (6), Canada (4), England (4), Spain (3), China (2), Netherlands (2), South Africa (2), Austria (1), Canary Islands (1), Finland (1), Italy (1), Mauritius (1), South Korea (1) Number of Sequences: 82 GISAID AA Query: No good ones GISAID Nucleotide Query: T25496A, A28447G, C29386T CovSpectrum Query: Nextcladepangolineage:XBB.1.16* & T25496A Substitutions on top of XBB.1.16: Spike: P521T ORF3a: I35K Nucleotide: C23123A, T25496A, C29386T

Phylogenetic Order of Mutations: C23123A(S:P521T) and T25496A (ORF3a:I35K) occurred simultaneously as far as I can tell. Usher puts ORF3a:I35K first, but that's because a number of sequences have no coverage at S:P521. There are no sequences that have coverage at S:P521 and lack S:P521T but have ORF3a:I35K

USHER Tree Usher puts ORF3a:I35K first, but that's because a number of sequences have no coverage at S:P521. There are no sequences that have coverage at S:P521 and lack S:P521T but have ORF3a:I35K. Absurdly, it seems that Usher separates this lineage into several separate branches. Again, this is very likely due to the many shoddy sequences, mostly from India, Thailand, and Ginkgo Bozoworks, which paid its top two executives over $728 million dollars in 2021 yet is still incapable of submitting a decent sequence. I excluded all the garbage sequences to make the tree appear reasonable. Below are the two biggest Usher trees. https://nextstrain.org/fetch/raw.githubusercontent.com/ryhisner/jsons/main/XBB.1.16_P521T_3aI35K_1.json?c=gt-ORF3a_35&label=id:node_6567902 https://nextstrain.org/fetch/raw.githubusercontent.com/ryhisner/jsons/main/XBB.1.16_P521T_3aI35K_2.json?c=gt-ORF3a_35&label=id:node_6564035

image image

Evidence S:P521T is (along with S:P521S) is a known advantageous mutation in XBB lineages. ORF3a:I35K requires a T→A nucleotide mutation, which is the 3rd-rarest type, behind only C→G and T→G, which likely explains its rarity up to this point. This lineage exhibits a modest growth advantage against baseline XBB.1.16, though it's too early to attribute any certainty to that, so I've not included the figure. (Three sequences are not on CovSpectrum and never will be because they do not include a month or day in their metadata.)

With virtually no sequencing coming from most of India nowadays, it's impossible to tell if this lineage is thriving there or if it's been outcompeted. It might be wiser to exclude Indian sequences altogether from any analysis as the lack of sequencing there (after a period of more intense sequencing in April) distorts everything.

Genomes — Note: I've subtracted the three pooled samples from Ginkgo Bozoworks from the 85 sequences on GISAID.

Genomes EPI_ISL_17466374, EPI_ISL_17466951, EPI_ISL_17467117, EPI_ISL_17467288, EPI_ISL_17471287, EPI_ISL_17508159, EPI_ISL_17527007, EPI_ISL_17548386, EPI_ISL_17549994, EPI_ISL_17582999, EPI_ISL_17602963, EPI_ISL_17606264, EPI_ISL_17614840, EPI_ISL_17622263, EPI_ISL_17622264, EPI_ISL_17622265, EPI_ISL_17623924, EPI_ISL_17638005, EPI_ISL_17642137, EPI_ISL_17653115, EPI_ISL_17662708, EPI_ISL_17664089, EPI_ISL_17676285, EPI_ISL_17681989, EPI_ISL_17699397, EPI_ISL_17700450, EPI_ISL_17703969, EPI_ISL_17709785, EPI_ISL_17714190, EPI_ISL_17718803, EPI_ISL_17718991, EPI_ISL_17720764, EPI_ISL_17720774, EPI_ISL_17720778, EPI_ISL_17720779, EPI_ISL_17720780, EPI_ISL_17727525, EPI_ISL_17730151, EPI_ISL_17730155, EPI_ISL_17730157, EPI_ISL_17730158, EPI_ISL_17730214, EPI_ISL_17730255, EPI_ISL_17730263, EPI_ISL_17742038, EPI_ISL_17742049, EPI_ISL_17744355, EPI_ISL_17744356, EPI_ISL_17761186, EPI_ISL_17763323, EPI_ISL_17767023, EPI_ISL_17776415, EPI_ISL_17777519, EPI_ISL_17779615, EPI_ISL_17783411, EPI_ISL_17784128, EPI_ISL_17784154, EPI_ISL_17784185, EPI_ISL_17784385, EPI_ISL_17784445, EPI_ISL_17784450, EPI_ISL_17786749, EPI_ISL_17786881, EPI_ISL_17787357, EPI_ISL_17791753, EPI_ISL_17793817, EPI_ISL_17794728, EPI_ISL_17795969, EPI_ISL_17797690, EPI_ISL_17799115, EPI_ISL_17802365, EPI_ISL_17802452, EPI_ISL_17802651, EPI_ISL_17802672, EPI_ISL_17803000, EPI_ISL_17803131, EPI_ISL_17803195, EPI_ISL_17803372, EPI_ISL_17804071, EPI_ISL_17806699, EPI_ISL_17806775, EPI_ISL_17808947
FedeGueli commented 1 year ago

Already designated! XBB.1.16.11

ryhisner commented 1 year ago

Huh. I tried searching both I35K and ORF3a:I35K, and I still get no results. I'll try to find the actual designation and see why the search didn't work.

ryhisner commented 1 year ago

I think there are two different lineages here, one consisting of XBB.1.16 + S:P521T with no ORF3a:I35K and another that has both S:P521T and ORF3a:I35K. Doing a GISAID search, there seem to be 142 XBB.1.16 sequences with S:P521T but only 73 sequences with both S:P521T and ORF3a:I35K.

Looking at all 142 sequences in Nextclade, this seems right. About half of them have ORF3a:I35K and half do not, and they all have coverage in that area.

FedeGueli commented 1 year ago

Yes @ryhisner it is that way, @corneliusroemer designated the one with orf3a:I35K, i think.

ryhisner commented 1 year ago

I'm not sure if that's the case. ORF3a:I35K isn't listed in the notes, and the designation from two days ago says there were 100 sequences, which is way too many. Two days ago there would've only been about 65 sequences with S:P521T and ORF3a:I35K.

image
FedeGueli commented 1 year ago

I'm not sure if that's the case. ORF3a:I35K isn't listed in the notes, and the designation from two days ago says there were 100 sequences, which is way too many. Two days ago there would've only been about 65 sequences with S:P521T and ORF3a:I35K.

image

When it was designated i took a sequence from lineage notes used to designate and i got a tree for it and it was this one. I m sure to have read i35K somewhere also in the commit. Lemme check

FedeGueli commented 1 year ago

Here: Screenshot_20230616-212007

FedeGueli commented 1 year ago

But i made your same thought about the number of sequences..

FedeGueli commented 1 year ago

@corneliusroemer could you check here please?

HynnSpylor commented 1 year ago

I've noticed XBB.1.16+S:P521T cluster since early May, but it was too messy, and I abandoned to propose it. If we simply query S:P521T+S:E180V+T28297C, it can get 138 seqs be analyzed more than 10 subtrees by Usher. the subtree with ORF3a:I35K indeed the largest branch, but the other 2 branches, with S:G72E then S:P521T (mainly in USA and Europe) and simply P521T (mainly in India, Ireland and UK) are also more than 30 seqs.

The convergent cluster of XBB.1.16*+S:P521T should be carefully designated.

QQ截图20230617155529

https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice3_genome_34d95_d5b350.json?c=gt-nuc_25496&f_userOrOld=uploaded%20sample&label=id:node_6564585

QQ截图20230617155611 (with a low-quality seq) https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice2_genome_34d95_d5b350.json?c=gt-nuc_25496&f_userOrOld=uploaded%20sample&label=id:node_6567017

FedeGueli commented 1 year ago

I've noticed XBB.1.16+S:P521T cluster since early May, but it was too messy, and I abandoned to propose it. If we simply query S:P521T+S:E180V+T28297C, it can get 138 seqs be analyzed more than 10 subtrees by Usher. the subtree with ORF3a:I35K indeed the largest branch, but the other 2 branches, with S:G72E then S:P521T (mainly in USA and Europe) and simply P521T (mainly in India, Ireland and UK) are also more than 30 seqs.

The convergent cluster of XBB.1.16*+S:P521T should be carefully designated.

QQ截图20230617155529

https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice3_genome_34d95_d5b350.json?c=gt-nuc_25496&f_userOrOld=uploaded%20sample&label=id:node_6564585

QQ截图20230617155611 (with a low-quality seq) https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice2_genome_34d95_d5b350.json?c=gt-nuc_25496&f_userOrOld=uploaded%20sample&label=id:node_6567017

Although they are not showing clear advantage.

ryhisner commented 1 year ago

The extremely variable rate of sequencing in India, along with the very poor quality of most sequences from there (often lacking coverage at S:521) makes detecting the relative growth rate of XBB.1.16 + P521T virtually impossible. Of the Indian sequences from the past 2-3 months, the best ones were mostly in March/April from Maharashtra. But many of the more recent ones are from places like Chhattisgarh, Odisha, and less proficient labs in Maharashtra. These are far more likely to lack coverage at S:521—whether it's indicated or not.

I think the ORF3a:I35K branch probably has a growth advantage over baseline XBB.1.16, but it's too difficult to say for certain with the issue of the Indian sequences fouling everything up.

If there was a way to do a CovSpectrum search excluding one or more countries—something like !country:India & ORF3a:I35K & ORF1b:D1754Y—I think we'd have a much better idea. Until then, we just have to wait until there are enough sequences in other countries to get a good assessment of the growth rate. Early indications from Australia seem to show ORF3a:I35K with a real advantage, but founder effects and such can't be ruled out at this point.

FedeGueli commented 1 year ago

The extremely variable rate of sequencing in India, along with the very poor quality of most sequences from there (often lacking coverage at S:521) makes detecting the relative growth rate of XBB.1.16 + P521T virtually impossible. Of the Indian sequences from the past 2-3 months, the best ones were mostly in March/April from Maharashtra. But many of the more recent ones are from places like Chhattisgarh, Odisha, and less proficient labs in Maharashtra. These are far more likely to lack coverage at S:521—whether it's indicated or not.

I think the ORF3a:I35K branch probably has a growth advantage over baseline XBB.1.16, but it's too difficult to say for certain with the issue of the Indian sequences fouling everything up.

If there was a way to do a CovSpectrum search excluding one or more countries—something like !country:India & ORF3a:I35K & ORF1b:D1754Y—I think we'd have a much better idea. Until then, we just have to wait until there are enough sequences in other countries to get a good assessment of the growth rate. Early indications from Australia seem to show ORF3a:I35K with a real advantage, but founder effects and such can't be ruled out at this point.

restricting to very recent samples that exclude indian sequence and points to the internationally spread ones agree with Ryan it could be in the first positions of collection42 :
https://cov-spectrum.org/explore/World/AllSamples/Past1M/variants?aaMutations=Orf1b%3AD54N&aaMutations1=Orf3a%3AI35K&analysisMode=CompareToBaseline& Schermata 2023-06-17 alle 15 44 54

AngieHinrichs commented 1 year ago

As @HynnSpylor observed:

If we simply query S:P521T+S:E180V+T28297C, it can get 138 seqs be analyzed more than 10 subtrees by Usher. the subtree with ORF3a:I35K indeed the largest branch, but the other 2 branches, with S:G72E then S:P521T (mainly in USA and Europe) and simply P521T (mainly in India, Ireland and UK) are also more than 30 seqs.

The convergent cluster of XBB.1.16*+S:P521T should be carefully designated.

@corneliusroemer it looks like you designated sequences from all 3 notable branches in XBB.1.16 that have S:P521T as XBB.1.16.11:

The support for the second branch's ORF3a:I35K-first is not extremely strong but not totally bogus-looking either: two sequences from India in March, two sequences from England in June, and then the sequences with both ORF3a:I35K and S:P521T are from India, USA, Thailand, Australia, Europe, China, March to June.

The support for the third branch's S:G72E and ORF1a:F548S first is stronger: 14 sequences from India, USA, England & Singapore with S:G72E (but not ORF1a:F548S & S:P521T) from March & April; then 3 sequences from Luxembourg & Canada, April and May, with S:G72E and ORF1a:F548S (but not S:P521T); then the larger branch with all three mutations including S:P521T, April to June, Canada, USA & Europe including Luxembourg.

Are you sure all three branches should be the same lineage XBB.1.16.11? I'm pretty sure that the branch with S:G72E > ORF1a:F548S > S:P521T got those in the right order, and I think ORF3a:I35K > S:P521T is at least plausible.

FedeGueli commented 1 year ago

@ryhisner did you see this has been absorbed by XBB.1.16.11 in the usher tree? maybe the lineage notes should be modified? @corneliusroemer @AngieHinrichs not a big worry anyway.