cov-lineages / pango-designation

Repository for suggesting new lineages that should be added to the current scheme
Other
1.04k stars 98 forks source link

BA.2 sublineage with S:K147E, W152R, F157L, I210V, G257S, D339H, G446S, N460K, R493Q (73 seq as of 2022-06-29, mainly India) #773

Closed silcn closed 2 years ago

silcn commented 2 years ago

Proposal for a sublineage of BA.2 Earliest sequence: 2022-06-02 (India) Countries detected: India (5 seq, from 3 states)

Defining mutations: S:K147E, W152R, F157L, I210V, G257S, D339H (mutated from G339D), G446S, N460K, R493Q (reversion) ORF1a:S1221L, P1640S, N4060S ORF1b:G662S E:T11A

Don't think much needs to be said to explain why I'm proposing this. Very recent, long branch with 9 new spike mutations, detection in multiple states that aren't all close together (Maharashtra, Karnataka, Jammu and Kashmir). I expect quite a few people have been monitoring it :)

Usher tree is a bit messy because of some poor quality sequences, particularly the one from Jammu and Kashmir which has multiple artefactual reversions. As a result this lineage is placed on a branch with a couple of Indian sequences with reversions at S:954 (also probably erroneous - apparent reversions at this site seem to crop up a lot in Indian sequences). Despite what the tree shows, the evidence is currently consistent with all 9 S mutations having appeared on the same long branch. As usual, the 2nt mutation at S:339 is mislabelled.

India_G339H

https://nextstrain.org/fetch/github.com/silcn/subtreeAuspice1/raw/main/auspice/subtreeAuspice1_genome_42b02_161030.json?branchLabel=Spike%20mutations&c=gt-S_446&label=nuc%20mutations:C3796T,C3927T,C4586T,C5183T,A12444G,G22577C,G22898A,T22942G,G23040A,A26275G

Genomes: EPI_ISL_13302209 EPI_ISL_13302252 EPI_ISL_13373059 EPI_ISL_13373170 EPI_ISL_13375776

Edit: cov-spectrum query https://cov-spectrum.org/explore/World/AllSamples/Past6M/variants?variantQuery=%5B6-of%3A+S%3A147E%2C+S%3A152R%2C+S%3A157L%2C+S%3A210V%2C+S%3A257S%2C+S%3A339H%2C+S%3A446S%2C+S%3A460K%2C+ORF1a%3A1221L%2C+ORF1a%3A1640S%2C+ORF1a%3A4060S%5D& Missing some sequences that only have a month of collection

ryhisner commented 2 years ago

Man, we must be on the same wavelength, Silcn, because I noticed one sequence of this yesterday. I wasn't quite sure what to make of it—whether it was real or not, coming from India, as I don't really have the expertise to judge such things. Excellent spot, mate.

silcn commented 2 years ago

Now found in Germany: EPI_ISL_13378378, EPI_ISL_13378924 And also in Canada: EPI_ISL_13392500 EPI_ISL_13389935 is another Indian sequence, with NNNs covering the locations of all the S mutations but it shares all the non-S mutations.

silcn commented 2 years ago

More Indian sequences: EPI_ISL_13409385, EPI_ISL_13409444, EPI_ISL_13409465

FedeGueli commented 2 years ago

Hi @silcn this morning i found a cluster in Germany with similar mutations (2 out of 8) S:R346K, S:H1101Y, S:460K, S:K147E) and the 493 reversion in Germany: EPI_ISL_13380484 EPI_ISL_13344896 EPI_ISL_13393217 EPI_ISL_13387937 EPI_ISL_13387015 EPI_ISL_13385893 EPI_ISL_13382835 EPI_ISL_13269776 EPI_ISL_13269123 EPI_ISL_13382561 EPI_ISL_13382506

I cant find them on Usher but maybe you can try to see if they are related or not.

silcn commented 2 years ago

@FedeGueli good spot! Those aren't related - they look like a BA.2/BA.1/BA.2 double-breakpoint recombinant with the first breakpoint between 12880 and 15240, and the second somewhere between 21641 and 21846, i.e. S:27 and S:95 (S:69/70del is present but not S:A67V so not clear whether that bit comes from BA.1 or BA.2). The BA.2 bit in the spike has those extra mutations as you say.

thomasppeacock commented 2 years ago

Although this is still fairly limited in absolute sequence number, the divergent mutation profile, the wide geographical spread, and the rapidness that new sequences have emerged makes me think this should get designated pretty soon to faciliate its monitoring.

c19850727 commented 2 years ago

@silcn most of these sequences have T5386G, and 2 of them also have A11537G though. I mean those brought up by @FedeGueli

silcn commented 2 years ago

@c19850727 hmmm, so they do. Doesn't seem too implausible for that to be real if a chronic infection was involved, as suggested by the S mutations? Multiple labs are involved so I doubt it's contamination.

JosetteSchoenma commented 2 years ago

I Sc2rfed the German ones. I think the 2 breakpoints are between 13195 and 15240 and between 21618 and 21762. Same regions as @silcn suggested, just narrowed the window a bit But loads of private mutations and all with an extra BA.1 T5386G and 2 with an extra BA.1 A11537G. Maybe something to keep an eye on and to open up its own issue? Screenshot_20220624-105045_Twitter.jpg Screenshot_20220624-105125_Twitter.jpg

FedeGueli commented 2 years ago

@JosetteSchoenma please do it! you made the analysis i just found it. It is worth flagging it. i will monitor it , at least then we could close the issue if nothing new pops up

InfrPopGen commented 2 years ago

Thanks for submitting. We've added lineage BA.2.75 with 3 newly designated sequences. Defining mutation(s) A22001G (S:K147E), T22016C (S:W152R).

silcn commented 2 years ago

@InfrPopGen the 4 sequences you've added to lineages.csv contain India/MH-INSACOG-CSIR-NEERI1939/2022 twice, which I presume is an error?

InfrPopGen commented 2 years ago

Thank you @silcn I've deleted the duplicate line!

silcn commented 2 years ago

Among the 20 new BA.2.75 uploads from Maharashtra today is an apparent outlier sequence, EPI_ISL_13502528, which has S:147E, 210V, 257S, 339H, 493Q and ORF1a:1221L but is missing the other defining mutations and instead has some more of its own (none in Spike though). It is classified as BA.2 rather than BA.2.75 by Usher. If this spreads then it could potentially deserve a separate designation; I'll keep looking out for more.

Another of the new Maharashtra sequences (EPI_ISL_13502546) has S:681R and ORF8:27* - something else to keep an eye on...

shay671 commented 2 years ago

As of this morning i found 38 samples.

https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_40dd0_bf4e50.json?branchLabel=Spike%20mutations&c=userOrOld&label=nuc%20mutations:G23040A

silcn commented 2 years ago

@shay671 I count at least 46, 47 if you allow EPI_ISL_13502528. NSP3_P822S, NSP8_N118S in GISAID is good at picking them up for now.

EPI_ISLs EPI_ISL_13302209 EPI_ISL_13302252 EPI_ISL_13373059 EPI_ISL_13373170 EPI_ISL_13375776 EPI_ISL_13378378 EPI_ISL_13378924 EPI_ISL_13389935 EPI_ISL_13392500 EPI_ISL_13409385 EPI_ISL_13409444 EPI_ISL_13409465 EPI_ISL_13438623 EPI_ISL_13438754 EPI_ISL_13446524 EPI_ISL_13446529 EPI_ISL_13455147 EPI_ISL_13458019 EPI_ISL_13461861 EPI_ISL_13463939 EPI_ISL_13471039 EPI_ISL_13471048 EPI_ISL_13493348 EPI_ISL_13493438 EPI_ISL_13493615 EPI_ISL_13498391 EPI_ISL_13498432 EPI_ISL_13498452 EPI_ISL_13502529 EPI_ISL_13502534 EPI_ISL_13502536 EPI_ISL_13502537 EPI_ISL_13502538 EPI_ISL_13502544 EPI_ISL_13502545 EPI_ISL_13502546 EPI_ISL_13502548 EPI_ISL_13502550 EPI_ISL_13502552 EPI_ISL_13502554 EPI_ISL_13502555 EPI_ISL_13502559 EPI_ISL_13502567 EPI_ISL_13502568 EPI_ISL_13502569 EPI_ISL_13502571 EPI_ISL_13502576
shay671 commented 2 years ago

You are 100% correct All cluster together : https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_25dfc_c7fcf0.json?c=userOrOld&label=nuc%20mutations:C3796T,C3927T,A12444G,G15451A,A22190G,G22577C,G22898A,T22942G,A26275G

thomasppeacock commented 2 years ago

As this lineage is getting quite a bit of attention I've hidden a couple of comments earlier on about an unrelated recombinant to prevent confusion - please do feel free to open a new designation issue if the recombinant continues to grow

silcn commented 2 years ago

NSP3_P822S, NSP8_N118S in GISAID is good at picking them up for now.

Of course today we get some sequences missing NSP8_N118S... not sure there's a single GISAID query that captures everything anymore, but by my count 13 new BA.2.75 sequences were uploaded from India today.

shay671 commented 2 years ago

Here is the case count by country/region for the 53 cases i found yesterday.

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

India | Haryana | 3 -- | -- | --   | Himachal Pradesh | 3   | Jammu and Kashmir | 1   | Karnataka | 10   | Maharashtra | 23 Germany | Baden-Wurttemberg | 1   | Rhineland-Palatinate | 1 UK | England | 5 Canada | Alberta | 1   | Ontario | 1 Australia | Victoria | 2 New Zeland | 2

FedeGueli commented 2 years ago

With the new sequences on covspectrum although it has to be taken with a huge grain of salt, it shows growth advantage vs BA.5. @corneliusroemer @thomasppeacock @chrisruis @AngieHinrichs probably this has to be flagged to public health agencies

karyakarte commented 2 years ago

That seems to be true. The numbers of BA.5 are not increasing. BA.4 is on the verge of being extinct. BA.2.38 is almost same in numbers with the rest of BA.2. BA.2.74 and BA.2.75 were identified (amongst previous BA.2) when the Fasta were re-run on UShER today. @FedeGueli @silcn @corneliusroemer @thomasppeacock @chrisruis @AngieHinrichs

FedeGueli commented 2 years ago

@karyakarte consider that there are multiple sub lineages of potential nterest, designated, proposed or unproposed, emerged first and now circulating in India. I cannot weigh how much of the apparent growth advantage is due to these sublineages keeping BA.5 to lower levels than outside of India or due to intrinsic BA.2.75 advantage that cant be ruled out or affirmed with certainty yet.

karyakarte commented 2 years ago

@FedeGueli I totally agree. Sequencing coordinated by our center of 1698 isolates of May and June 2022 show BA.2 (48.62%), BA.2.12.1 (2.41%), BA.2.38 (39.33%), BA.4 (1.20%), BA.5 (3.11%) (with negligible numbers of other BA.2 sub-lineages making a total of 100%). This shows the reason for my earlier comment 10 hours back. that maybe some other lineage currently classified as BA. sub-lineage is growing. In our last run at BJGMC, Pune we found BA.2.74 and BA.2.75 that was red-flagged. Hence, we re-run Fasta available with us, with adequate coverage, at UShER. The results showed that a number of BA.2 turning out to be BA.2.74/BA.2.75. Hence, this issue is highlighted at appropriate fora with following comments, "Request to urgently re-run fasta on UShER again particularly samples from end of May and entire June. The predominance of BA.2 on a waning wave was unexplainable. May be it is BA.2.74/BA.2.75 - with more than 80 mutations. A new designation for these variants seems essential, otherwise epidemiological response is blunted." We need to look into the matter @silcn @corneliusroemer @thomasppeacock @chrisruis @AngieHinrichs.

FedeGueli commented 2 years ago

Thx Prof. @karyakarte if needed i can open in minutes an issue with all indian BA.2 sublineages we are tracking to help in a fast designation of them cc @chrisruis @AngieHinrichs @corneliusroemer @thomasppeacock @InfrPopGen let me know. i have already everything tracked with route from route let me know if could be a good idea.

FedeGueli commented 2 years ago

@karyakarte here the link with all the BA.2 sub lineages we are monitoring in India: https://docs.google.com/spreadsheets/d/1zKF3PKaosF3OCcW54gWkyV7Spz8uWrCsYef9IHbj7FA/edit?usp=sharing

silcn commented 2 years ago

Regarding BA.2 lineages in India: Japan just uploaded 42 sequences from travellers from India, sampled from 27 May to 16 June, which might give a more representative sample given the differences in sequencing quantity across India. Here is the lineage breakdown according to Usher:

17 BA.2.38 (of which 1 is #809) 15 BA.2 (of which 8 are #787, 2 have S:L452M but are not BA.2.56, and 1 has S:R346T+S:L452M but is not BA.2.74) 4 BA.2.56 2 BA.2.74 1 BA.2.75 1 miscellaneous BA.1/BA.2 recombinant 1 BA.4 1 BA.5.2

I agree with the comments in that thread that there is a very strong case for designating #787.

karyakarte commented 2 years ago

Thanks for the lightning-fast response! Your proposal for a fast designation to efficiently track emerging sub-lineages will answer the question of the rising case numbers by the same variant (BA.2) after waning of the 3rd wave it caused.

FedeGueli commented 2 years ago

@karyakarte i have updated that list with a prevalence region by region lineage by lineage sheet. You can find it here Schermata 2022-07-03 alle 08 48 44 To me this sheet shows as it is actually a team work by BA.2 recently evolved fast sublineages in keeping BA.5 at the corner in India. Probably this contributes to boost apparent growth advantages there to irrealistic levels. And that if it is less fascinating for media headlines probably it is more worrying cause could mean BA.2 found multiple ways to compete vs BA.5 , my two cents maybe i am totally wrong. cc @silcn @corneliusroemer @thomasppeacock

karyakarte commented 2 years ago

@FedeGueli thanks to you and the team's effort in trying to solve the puzzle of BA.2 equaling BA.2.38 in India. Numbers of BA.4 and BA.5 are also kept in check by both BA.2 and BA.2.38. But with many BA.2 turning out to BA.2.74, BA.2.75, now BA.2.76, and a few more to come, the puzzle of the same variant causing surge after a wave will be solved. Further, as you mentioned multiple ways to compete, the strategy of BA.2 survival - mutational change in the immunodominant epitopes, seems to be successful. We are looking at it at present.

silcn commented 2 years ago

NSP3_P822S, NSP8_N118S in GISAID is good at picking them up for now.

Heads up for those tracking with GISAID: NSP3_S403L, NSP8_N118S is now a better proxy for BA.2.75 than the one above.

shay671 commented 2 years ago

Two cases were spotted in Israel (I'll update after it's uploaded to GISAID); they are returning from France without known contact with someone from India. They are clustering with a sample previously collected in the Netherlands. This is evidence for BA.2.75 already transmitting in Europe, not in relation to India.

https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_3cd0c_403380.json?c=pango_lineage_usher&label=nuc%20mutations:C9693T

RajLABN commented 2 years ago

@AngieHinrichs @corneliusroemer @thomasppeacock @silcn EPI_ISL_14402515-14402516, EPI_ISL_14402520 from Assam, India -- earliest BA.2.75?

image

https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_dd44_876cf0.json?branchLabel=Spike%20mutations&c=gt-S_460,446,210&label=nuc%20mutations:T15867G

silcn commented 2 years ago

@RajLABN those are date errors.