cov-lineages / pango-designation

Repository for suggesting new lineages that should be added to the current scheme
Other
1.04k stars 97 forks source link

BA.2.3 Sublineage with 10 highly convergent S1 mutations (5 seqs, 3xSingapore, 1xAustralia, 1xUSA) #1013

Closed ryhisner closed 2 years ago

ryhisner commented 2 years ago

Description Sub-lineage of: BA.2.3 Earliest sequence: 2022-8-15, USA, California — EPI_ISL_14744358 Most recent sequence: 2022-8-24, Singapore — EPI_ISL_14727166 Countries circulating: Singapore (2 local cases), Australia (1), USA (1) Number of Sequences: 4 Substitutions on top of BA.2.3: Spike: M153T, N164K, H245N, G257D, K444R, N450D, L452M, N460K, E484R, R493Q (R) ORF1a: T727I, A1049V, I1714T, M2169V, T2174I, T2648I, Q3922R ORF1b: T1404M Nucleotide: C1471T, C2445T, T5406C, A6770G, C6786T, C8208T, C10189T, A12030G, C17678T, C18252T, T22020C, T22054G, C22295A, G22332A, A22893G, A22910G, C22916A, T22942G, G23012A, C23013G, C28550A

USHER Tree https://nextstrain.org/fetch/raw.githubusercontent.com/ryhisner/jsons/main/subtreeAuspice1_genome_7869_f60fb0%E2%80%94BA.2.3%E2%80%94Beast.json Screen Shot 2022-08-31 at 5 27 56 PM

Evidence This is an extraordinary saltation, and four sequences have suddenly showed up in three countries very distant from one another. It contains numerous convergent mutations in spike, as well as a very rare 2-nucleotide mutation with S:A484R. As @Sinickle has pointed out to me, the path to A484 to R484 requires an intermediate stop at either G484 or T484, both very uncommon. Both cases in Singapore are described as local cases, suggesting it has been circulating there to some extent. This is also suggested by the additional mutations found in one Singapore sequence (S:N17S, ORF1a:A1049V, and nuc A126G). The earliest sequence is from the United States, quite far removed from the other three, suggesting this might already be geographically widespread.

Genomes EPI_ISL_14723265 EPI_ISL_14727166 EPI_ISL_14736013 EPI_ISL_14744358

FedeGueli commented 2 years ago

Thx @ryhisner great catch. 10 S1 mutations maybe @shay671 who works daily on this kind of saltation at ILVR could be interested in.

corneliusroemer commented 2 years ago

This is a very interesting lineage, thanks @ryhisner for proposing.

Here's an overview of the mutations relative to BA.2 per Nextclade

image

There's not much diversity here, at most 1-2 mutations suggestive of a very recent common ancestor of these 4 sequences. That's indicative of a high growth rate, together with the fact that the sample dates are very recent and this is being simultaneously detected in 4 countries.

Here's the metadata from GISAID:

image

As soon as we see any more uploads from this lineage we should designate, this kind of lineage does not arise often.

I have edited the long title to make it conform more with the standard issue title, I hope that's ok.

corneliusroemer commented 2 years ago

I tried to find out what the most closely related sequences outside this cluster are. The best I could find are these two sequences:

hCoV-19/Malaysia/USM_Monash_JKNK_K22_4549/2022|EPI_ISL_14216841|2022-06-22
hCoV-19/Indonesia/JI-ITD-57295NT/2022|EPI_ISL_14667834|2022-08-18

They both share the S:493 reversion and the doublet S:H245N, S:G257D and are in BA.2.3 - but they are still very far from this cluster, so this is not a smoking gun re ancestry.

This is the query I used in case anyone is interested: https://cov-spectrum.org/explore/World/AllSamples/AllTimes/variants?variantQuery=nextcladePangoLineage%3ABA.2.3*+%26+%5B3-of%3A++C1471T%2C+C2445T%2C+T5406C%2C+A6770G%2C+C6786T%2C+C8208T%2C+C10189T%2C+A12030G%2C+C17678T%2C+C18252T%2C+T22020C%2C+T22054G%2C+C22295A%2C+G22332A%2C+A22893G%2C+A22910G%2C+C22916A%2C+T22942G%2C+G23012A%2C+C23013G%2C+C28550A%2C+23040A%5D&

Usher placement is different as visible in Ryan's proposal. In both scenarios, the focus seems to be South East Asia: Indonesia, Malaysia, Philippines, which is also where BA.2.3 is most common so this is not surprising.

For a GISAID query, I suggest the following for now, since Spike often has Ns, whereas ORF1a is much more conserved and covered:

NSP2_T547I,NSP3_I896T
shay671 commented 2 years ago

Regarding the S:N460K and the reversion in 493. I look at the Usher build (shared here a build on fewer samples), and it seems there that those 2 mutations are in prior branches, but those are the only mutations at those 2 steps. And in addition, the small branches there have other S1 mutations after the 460 and 493. So - is it possible that they are part of this saltation?

https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_1162e_6bd20.json?branchLabel=Spike%20mutations&c=pango_lineage_usher&label=nuc%20mutations:G23040A

(and @ryhisner - great catch!)

ryhisner commented 2 years ago

Thanks, @shay671. I had assumed N460K was part of the saltation since, when I uploaded the sequence, Usher put it at the base of a bunch of divergent BA.2's that span the globe and pretty clearly aren't related to one another. But the Usher build you link to only includes SE Asian sequences in the N460K branch, so that makes it less clear. I would guess that both N460K and the R493Q reversion are part of the saltation since we've seen them evolve independently in hundreds of other divergent sequences, but I don't know if there's any way to tell for sure. It just seems like the most parsimonious explanation.

thomasppeacock commented 2 years ago

Another local case from Singapore (EPI_ISL_14785643), think its worth designating this sooner rather than later

corneliusroemer commented 2 years ago

I agree - I'll designate it together with the other ones that are on the list. Thanks Tom for the update!

corneliusroemer commented 2 years ago

Added this as BA.2.3.20, thanks @ryhisner

shay671 commented 2 years ago

If this divergence between the samples is real, that may imply of greater spread under the radar. What do you think ? @corneliusroemer @thomasppeacock @ryhisner @FedeGueli ?

image

https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_13a34_205140.json?c=pango_lineage_usher&label=nuc%20mutations:G23040A

FedeGueli commented 2 years ago

@shay671 the two most recent sample from Singapore are really distant each from the other , to me this points toward importation from a vey undersampled area

corneliusroemer commented 2 years ago

And this is also somewhat good news, as the more diversity is, the longer this has been around and the less explosive it is.

FedeGueli commented 2 years ago

7 samples as today last Queensland Aus and California, Us already spotted by @thomasppeacock in the last days. Sorry to the pango team if i am updating closed issue. i have the feeling this could help in this phase where is still not 100% clear what will happen. Hypermutated or Hypertransmissible lineages which emerged from undersampled area and likely or very prevalent or very fast growing in the area where emerged first, could give birth with a couple of more muts to a dominant strain all of a sudden.

FedeGueli commented 2 years ago

9 samples as today South Korea via Philippines plus Texas (spotted by @ryhisner )

FedeGueli commented 2 years ago

new seq from Aus : EPI_ISL_14915708

FedeGueli commented 2 years ago

5 more sequences : 3 Singapore 2 Australia

So total of 6 uploaded today: 2 West Australia 1 NSW, 1 imported from.Philippines into Singapore 1 local case in Singapore 1 imported from an unknown country to Singapore

@corneliusroemer @thomasppeacock

FedeGueli commented 2 years ago

First sample from Europe , Denmark collected on 06/09/22

johnklopfer commented 2 years ago

I suggested to ACROBio to synthesize recombinant spike protein for BA.2.3.20. They expect to stock it in 8 weeks. Feel free to let wet lab folks know.

FedeGueli commented 2 years ago

3 new sequences from Austria all from "Preselected sample"

agamedilab commented 2 years ago

grafik +4 x Austria and +1x Denmark today so far.

EPI_ISL_14948167-14948168, EPI_ISL_14948181, EPI_ISL_14948241, EPI_ISL_14951071

corneliusroemer commented 2 years ago

3 new Danish BA.2.3.20 just uploaded today - we're now at 23 sequences.

FedeGueli commented 2 years ago

Another one from.Luxbourg EPI_ISL_14976564 (spotted by @c19850727)

24 sequences oo Gisaid (plus one more from austria not caught by the query publiahed here)

UlrichElling commented 2 years ago

Another one today in Austria including the reversion of 493. Will be uploaded soon. The worldwide spread in absence of an obvious first place of appearance argues for a dark spot in surveillance, I think. Thus we might see more soon as we are only detecting the spillover so far. E484R is interesting, I remember a paper long time ago that found R to be the "best" in position E484. Moreover, codon GAA is mutated to GCA in BA.2, but in this lineage it is AGA. So it is a 2 nt jump from BA.2.

corneliusroemer commented 2 years ago

Unfortunately it's bad news if there's a blind spot - it means that this is not just a random "blip" but that this lineage has in fact already sustained growth for a while. That it gets detected only now means the growth is fast in the place of origin.

Similar situation as with BQ.1.1 which is suddenly popping up everywhere.

While growth advantages are still too uncertain to be used - a very simple metric to guide what to look for is: number of days between uploads of 1st, 5th, 10th, 25th, 50th, 100th sequence etc

Maybe we could use this in some of the issues. Of course this is a very rough metric with lots of caveats. @FedeGueli maybe you could add these kinds of metrics to posts - this could be even more useful than just saying "another x found here" - at least when the lineages are clearly growing. Pointing out new sequences is useful when we are below say 10-20 sequences.

Importantly, I'm looking at submission_date here - as it's not going to change with new data being uploaded. Of course collection date would be ideal - but that data would be censored.

One can easily compile these numbers by filtering submission date to field on GISAID

image

Looking at this metric, it appears that BA.2.75.2 is slightly faster than BQ.1 - but that BQ.1.1 could be competitive with BA.2.75.2. An alternative would be to note the dates when we reach 1, 5, 10, 25, 50, 100, 200, 400 sequences.

Here is sample data: BA.2.3.20 1: 2022-08-30 5: 2022-09-02 (+3d) 10: 2022-09-12 (+10d) 25: 2022-09-14 (+2d) rough doubling time: 3-10d

BQ.1 1: 2022-08-08 5: 2022-08-24 (+16d) 10: 2022-08-29 (+5d) 25: 2022-09-05 (+7d) 50: 2022-09-08(+3d) 100: 2022-09-13(+5d) rough doubling time: 5-7d

BQ.1.1 1: 2022-08-31 5: 2022-09-09(+10d) 10: 2022-09-12 (+3d) 25: 2022-09-14 (+2d) rough doubling time: 3-7d

BA.2.75.2 1: 2022-08-06 5: 2022-08-16 (+10d) 10: 2022-08-18 (+2d) 25: 2022-08-20 (+2d) 50: 2022-08-24(+4d) 100: 2022-08-30(+6d) 200: 2022-09-06(+7d) 400: 2022-09-14 (+8d) rough doubling time: 6-8d

I suggest we track these on a Google Sheet that I will give write access for contributors: https://docs.google.com/spreadsheets/d/1sMCQyPfMG-pqd8Z0aoV6aJRHCc4vXusGl7pEJ68j10w/edit#gid=0

image
johnklopfer commented 2 years ago

E484R is interesting, I remember a paper long time ago that found R to be the "best" in position E484.

Here's the paper and the relevant graph. The discussion of E484R only appears in the preprint - not sure why they took it out of the published version.

Fb5WvNuUYAEiLXw

https://www.biorxiv.org/content/10.1101/2021.01.06.425392v1.full.pdf

FedeGueli commented 2 years ago

@c19850727 spotted 4 mores equeces from Australia. It is fast, unluckily.

28 viruses on Gisaid as today with NSP2_T547I,NSP3_I896T

UlrichElling commented 2 years ago
image

Needed to dig deep in my memory now :-) but here is another one. According to the Starr et al. 2020 from the Bloom lab higher ACE2 binding affinity. And if we see 2 nt changed in position 484 it would surprise me a lot if it was neutral.

UlrichElling commented 2 years ago

The excel is a great initiative, thanks! I cannot quite wrap my head around it yet, but what about also collecting the number of countries where something is detected? If it spreads fast assuming a sort of constant mobility then it should be more clustered, so # of countries should grow slower than # cases relative to a slow variant. If however -as in this case- the number of countries initially grows overproportionally, then it sort of suggests a surveillance blind spot but later growth in countries should align to growth in numbers.

UlrichElling commented 2 years ago

PS: "country" unfortunately is a very wide range. Maybe regions/provinces in very big countries?

UlrichElling commented 2 years ago

BJ.1   search term: M_D3Y, Spike_G798D 01.08.22   08.08.22 | 7 08.08.22 | 0 30.08.22 | 22 09.09.22 | 10

johnklopfer commented 2 years ago

FWIW - for BA.2.3.20 it may be worth trying to trace back the E484R mutation.

I think I saw one or two in Indonesia this summer, and it's low-prob enough that it might be on the same tree. Did not have the tools to do it myself.

corneliusroemer commented 2 years ago

@UlrichElling Country is useful but it's more qualitative - it's not expected to converge to a nice doubling time. Countries are also heterogeneous entities and affected by varying levels of travel between them. So I feel for now it's best to focus on raw numbers until growth advantage is reliable.

FedeGueli commented 2 years ago

One more sequence from Denmark.

FedeGueli commented 2 years ago

FWIW - for BA.2.3.20 it may be worth trying to trace back the E484R mutation.

I think I saw one or two in Indonesia this summer, and it's low-prob enough that it might be on the same tree. Did not have the tools to do it myself.

there is also a French BA.2 in August with S:484R and S:444R

ryhisner commented 2 years ago

I think it's notable that the newest sequence from Denmark, along with one sequence from Singapore and one from Australia, have S:N17S. S:N17 holds a glycan and is located at the border of the NTD antigenic supersite. Text and figure below are from the Veesler Lab study "N-terminal domain antigenic mapping reveals a site of vulnerability for SARS-CoV-2." https://www.cell.com/cell/fulltext/S0092-8674(21)00356-1

image

image

c19850727 commented 2 years ago

In Singapore's latest batch of upload (2022-09-20), there are 19 BA.2.3.20 among a total of 377 genomes. 15 of which are local cases, 1 each imported from Indonesia, Philippines and Thaialand respectively, and 1 unknown.

EPI_ISL_15031190, EPI_ISL_15031196, EPI_ISL_15031198, EPI_ISL_15031203, EPI_ISL_15031205, EPI_ISL_15031207-15031208, EPI_ISL_15031882-15031885, EPI_ISL_15031888-15031894, EPI_ISL_15031898

johnklopfer commented 2 years ago

In Singapore's latest batch of upload (2022-09-20), there are 19 BA.2.3.20 among a total of 377 genomes.

Does that bring the total above 50 sequences (to update @corneliusroemer spreadsheet)?

FedeGueli commented 2 years ago

Alreaady in the main proposal but adding it here :BA.2.3.20 has T22942G for S:460K (as Ba.2.75) while BQ.1 has T22942A

ryhisner commented 2 years ago

Two BA.2.3.20 sequences from California have an additional NTD spike mutation; one has K150I—EPI_ISL_15113737—and the other Y145H—EPI_ISL_15113297.

A recent Australian sequence also had an additional NTD mutation: S:K182N. EPI_ISL_15081765

Also worth noting that two of the six new sequences today have S:N17S. The N17S branch now has 17 sequences and will likely be worthy of designation soon.

FedeGueli commented 2 years ago

@ryhisner please propose S:17S in a separate issue being closed this one i think it would not be monitored constantly.

Today i spotted a weird BA.2.3.20 lacking S:444R and with a rare S:E465D mutation : EPI_ISL_15125279 Schermata 2022-09-27 alle 09 13 19

agamedilab commented 2 years ago

@FedeGueli I was just looking at that one as well ;)

grafik https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_d1d0_29c970.json?branchLabel=aa%20mutations&c=pango_lineage_usher

ryhisner commented 2 years ago

Note that the E465D sequence also has a F371S reversion. At least one study, by @theosanderson & others, found this mutation (along with the 373-375 mutations) to severely decrease infectivity in virus-like particle experiments. The sequence looks clean, so this could be important.

ryhisner commented 2 years ago

Two Australian sequences uploaded late yesterday have S:G485D. image EPI_ISL_15120082 EPI_ISL_15119919 Collection dates September 10 & 11.

ryhisner commented 2 years ago

Those two also have ORF1a:F3753V, an NSP6 mutation (NSP6_F:184V). image

cvejris commented 2 years ago

Note that the E465D sequence also has a F371S reversion. At least one study, by @theosanderson & others, found this mutation (along with the 373-375 mutations) to severely decrease infectivity in virus-like particle experiments. The sequence looks clean, so this could be important.

Interesting - S371F was mutated into 371L already in BA.1, hinting that this mutation is not super stable

ryhisner commented 2 years ago

One sequence from Denmark today has S:V350A, a rare mutation and the first BA.2.3.20 mutations on top of BA.2 to be in the RBD but not the RBM. EPI_ISL_15155944 image

cvejris commented 2 years ago

image As revealed by the wonderful new functionality in CovSpectrum, it is clear that two non-Spike mutations are creeping up in frequency: ORF1a:T2087I and ORF1a:S2103F. These are T1269I and S1285F in NSP3, residing in the betaCoV-specific marker domain. S2103F is also present in BS.1. Furthermore, S2103F in BA.2.3.20 is cosmopolitan, while T2087I is Denmark-specific and occurs exclusively in S2103F genetic background, i.e. is a double mutant! https://cov-spectrum.org/explore/World/AllSamples/Past6M/variants?aaMutations=ORF1a%3AT2087I%2CORF1a%3AS2103F&nextcladePangoLineage=Ba.2.3.20*& I suggest these should be monitored and possibly designated.

FedeGueli commented 1 year ago

S:Q675R found now in two sequences according to gisaid query: Spike_Q675R,NSP2_T547I,NSP3_I896T

ryhisner commented 1 year ago

Today the 3rd BA.2.3.20 sequence with S:G485D turned up. The first two were in Australia, and this one is from California. With a spike mutation in such an important RBM region, adjacent to a mutation not seen in any previous lineage (A484R), and now having shown up in two different continents, I think this one should be monitored closely. If it grows, it ought to be considered for quick designation. EPI_ISL_15242813

ryhisner commented 1 year ago

A unique three-sequence cluster was uploaded yesterday from Connecticut, USA, and has S:H69Y and N:P279Q, neither of which has been seen in any previous BA.2.3.20 sequences. There have only been 11 sequences of BA.2.3.20 in the eastern United States, so these make up a considerable fraction there.
EPI_ISL_15235107 EPI_ISL_15235149 EPI_ISL_15235150

ryhisner commented 1 year ago

BA.2.3.20 from California uploaded today that has S:R444M. It is part of the S:N17S branch. EPI_ISL_15256979

Two notable pairs of sequences uploaded today from Australia with new spike mutations: • Two with S:N185T EPI_ISL_15250240, EPI_ISL_15250537 • Two with S:D53G EPI_ISL_15250240, EPI_ISL_15250537