cov-lineages / pango-designation

Repository for suggesting new lineages that should be added to the current scheme
Other
1.04k stars 98 forks source link

New BA.2.38 sublineage with S:444N, S:147E, S:692L (350 seqs as of 2022-08-10, mostly India) #746

Closed Sinickle closed 2 years ago

Sinickle commented 2 years ago

Credit to @bitbyte2015 for being the original one (to my knowledge) to find these sequences!

This potential sublineage features interesting mutations that make it distinct, and although it has been sequenced just 7 times, it has been found in Brazil, Australia, USA, and Japan (but the Japanese sequences were in travelers from India)

Description

Potential sublineage of: BA.2.38 (the Indian sublineage with spike 417T from a BA.2 root)
Earliest sequence: 2022/05/06 (Australia)
Most recent sequence: 2022/05/22 (USA-California)
Mutations on top of BA.2:

Gene | Amino changes -- | -- ORF1b | D51N Spike | K147E, K444N, I692L image

nextstrain tree [cov-Spectrum query]

The lineage I'm proposing is the one in the red box. Since the parental lineage, BA.2.38 without 6091T, is most predominant in India, and the Japanese cases in my proposed lineage are confirmed to be from India-related travel, it seems likely that this lineage is more prominent in India despite never being sampled there. Additionally, this lineage is 5 nucleotide mutations away from any possible parental sequence which likely implies lack of sampling of intermediaries rather than a true evolutionary jump. The Indian sequences not in the red box in the screenshot share just 1 synonymous nucleotide mutation, and several unique ones -- I believe that the right thing to do here would be to include the Indian non-red box sequences as regular BA.2.38 in the classifier training set, and the red box ones as the new proposed lineage, which would be defined by T7153C + the previously listed AA mutations.

Proposed sublineage: EPI_ISL_13175338, EPI_ISL_12808436, EPI_ISL_12808434, EPI_ISL_13027402, EPI_ISL_12808520, EPI_ISL_13027403, EPI_ISL_12767836

Regular BA.2.38 that share T7153C (think including these in the training set might improve specificity of the proposed lineage, especially since one of these also obtained S:444N independently.): EPI_ISL_12953126, EPI_ISL_12953226, EPI_ISL_12953163, EPI_ISL_12953161

EDIT: As others point out in the comments below, another notable branch has formed, starting at the S:444N mutation. If my proposed branch is designated, we should be careful that the other branch doesn't get misclassified.

Sinickle commented 2 years ago

The main reason this genome stands out to me is the S:417T + S:444N combo, but having two additional spike mutations and having spread to 4 continents while likely being from an under-sequenced area makes this more worthwhile to designate than other larger S:417T + S:444N combo lineages, in my opinion.

Bloom labs believes S:444 is one of the most important sites to watch for mutations for, as a method of achieving immune escape. I claim that S:444N becomes more beneficial after S:417T is acquired.

The following tool does not show all sequences, but is hopefully sufficient for the argument.

Since Omicron with S:417T has never made up even 4% of cases for a time period, I believe these numbers argue that S:444N is more likely to succeed when S:417T is present.

This possibility makes it important to monitor a lineage that has this pair of mutations.

shay671 commented 2 years ago

Excellent job. Regarding for if this is a jump or genetic drift - This lineage is 5 mutations from the closest sequence, but 4 mutations from the ancestor shared with the Indian clade in the picture. Those four are all Nonsynonymous, 1 in NTD, one in RBD, and one in proximity to the furin cleavage site. Three converging hot spots. So if this is indeed an inter-host accumulation of mutations, there's s evidence here for strong selection (we would have expected some synonymous or ORF1ab mutations combining in the way)

chrisruis commented 2 years ago

Very small so will close for now but can reopen if this grows with an epidemiological event in the future

FedeGueli commented 2 years ago

Hi @chrisruis this is small but sampled in 4 contintents, clear link to an undersampled area (India). I suggest probably to apply the monitor label and leave this open even if probably no BA.2 sublineage could compete with BA.5

FedeGueli commented 2 years ago

5 continents now, sampled in Denmark (June) Schermata 2022-06-15 alle 23 43 50

silcn commented 2 years ago

Found 3 sequences from Telangana, India: EPI_ISL_13307928-13307930. They're all pretty poor quality, but share many of this lineage's defining mutations, in particular S:147E, S:692L and nuc:T7153C. So even though one is missing ORF1b:D51N and all three have S:417N rather than 417T (backfilling? or a genuine reversion?) I feel it's safe to say they're from this lineage. They all have NNNs at S:444.

The long branches in the Usher tree are most likely a reflection of the poor sequence quality rather than genuine diversity.

pango746
FedeGueli commented 2 years ago

Hi @silcn good catch. i have found another sequence (earlier in may) from Thailand, this time missing S:147E but with orf1b:51N and 7153C and it should have S:417T and S:444N beside 25416T EPI_ISL_13065568 that could be related with this lineage.

silcn commented 2 years ago

@FedeGueli well spotted, that one is also missing S:692L though so it's probably not part of the proposed lineage though it's very closely related

FedeGueli commented 2 years ago

@silcn thank you yes agree related but not part of it and also very low quality, pointed to me by @ryhisner , that makes harder to.understand how much closely it is related or not. Have you verified if there were some uptick in cases or +rate in that region of India where your newly found sequences come from?

Edited found this: https://telangananewspoint.com/telangana-reports-285-new-covid-19-cases-on-thursday/ Generally India is seeing a.moderate uptick of cases with over 12k cases threshold after 111 days . Impossible to relate that with this or other sublineages

silcn commented 2 years ago

The three sequences I found are from the city of Hyderabad, one sampled 2022-05-26 and two on 2022-05-28. For a sense of scale, 116 sequences have been uploaded from Hyderabad with a collection date since 2022-05-26, of which 31 are either BA.2.12.1, BA.4 or BA.5.

If there is somewhere in India where this is very prevalent, then it likely isn't Hyderabad, that's just where we happened to get the first sequences from.

silcn commented 2 years ago

And now a better quality Indian sequence, EPI_ISL_13342191 from Chennai, Tamil Nadu

silcn commented 2 years ago

West Bengal just uploaded 32 new sequences from this lineage. Total is now over 50.

FedeGueli commented 2 years ago

55 sequences on Usher as today. 47 out of 55 are from India confirming what was early discovered by @silcn. https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_30d7a_2fb660.json?branchLabel=aa%20mutations&c=pango_lineage_usher&label=nuc%20mutations:A22001G,A23636C Schermata 2022-07-04 alle 16 41 13

The tree now shows the recent uploaded indian sequences mentioned by @silcn and we can appreciate the diversity in the tree of this sublineages that was first spotted by introduction in different continents.

As requested by me in #814 this lineage is worth a designation and it would help indian authorities in tracking their things there as stated for BA.2.75.

agamedilab commented 2 years ago

Just spottet one of those in Austria but with additionally S:K558N (EPI_ISL_13632738):

grafik

https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_1e62a_4296b0.json?branchLabel=aa%20mutations&c=pango_lineage&label=nuc%20mutations:T7153C,G22894C

bitbyte2015 commented 2 years ago

image This lineage is now up to 122 sequences and is placed next to a new interesting sublineage with K444N and F157S that has 63 sequences. 157 has been seen to confer a growth advantage on its own. Seeing that the number of sequences continues to grow and this branch has now spawned a second interesting spike profile, I think that this should be designated

FedeGueli commented 2 years ago

Thx @bitbyte2015! The sibling lineage with S:F157S has been proposed in #828.

Hi @chrisruis following our conclusions in #814 probably these two sublineages deserve a designation also considering that at least 5 high trasmissible lineages are circulating there (BA.2.75, BF.3, BA.2.38.1, BA.2 74, BA.2.76)

FedeGueli commented 2 years ago

@chrisruis @corneliusroemer this now seems very close to BA.5 in India: https://cov-spectrum.org/explore/India/AllSamples/Past3M/variants?aaMutations=M%3A3N&nucMutations=12160A&aaMutations1=S%3AK147E%2CS%3AK444N%2CS%3AI692L%2CS%3A417T&analysisMode=CompareToBaseline&

Sinickle commented 2 years ago

@FedeGueli brought it to my attention that this proposed sublineage is much better captured with just searching S:147E, S:692L. This is because a large amount of the sequences have dropout at either S:417T or S:444N. The new query is here Compared to BA.2.38, this has a [71-129%] growth advantage in India.

FedeGueli commented 2 years ago

thx @Sinickle for highlighting here the new query. Notably with the updated query it shows a slight growth advantage also versus BA.5 in India: chrome_screenshot_1658241516843 https://cov-spectrum.org/explore/India/AllSamples/Past3M/variants?aaMutations=M%3A3N&nucMutations=12160A&aaMutations1=S%3AK147E%2CS%3AI692L&analysisMode=CompareToBaseline&

Following this update i think this should be reopened and monitored along the other open issues @chrisruis @AngieHinrichs @InfrPopGen @corneliusroemer .

It has also to be noticed that summed to the sibling lineage with S:444N 7153C and S:157S it represents between 1/2 and 1/3 of the BA.2 38 samples with S:444N

FedeGueli commented 2 years ago

285 sequences as today. after a bit withiut new samples a lot came in all together

@corneliusroemer @chrisruis @InfrPopGen

i am monitoring it since mentioned in #814

While ita growth it is very irregular probably due prevalence in undersampled area i think it deserves to be reopened and designated along the other BA.2.38 +478R.

Now it has been sampled in 11 different countries: https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_39bdb_a5caa0.json?branchLabel=aa%20mutations&c=country&label=nuc%20mutations:G13618A

FedeGueli commented 2 years ago

Big upload of sequences in the last three days: 52 seqs added mainly by West Bengala but also Sikkim (3) ,Karnataka (3) Chhattisgarh(1).

Note that ~10% of them.have been collected since 18 July through today. so very recent indicating circulation.

In the last month this sublineage represented between 2 and 3% of total BA.2.38 sequences, representing the fourth sublineage after .1/.2/.3 i do think it is worth reopen and designate

@corneliusroemer @thomaspeacock @InfrPopGen @chrisruis

cc @silcn have you any note on this lineage?

Sinickle commented 2 years ago

I guess with Xie's BA.2.75 paper identifying S:147E as a substantial immune escape mutation, and Bloom Lab identifying S:444 mutations also as substantial immune escape, it provides some mechanism for why this proposed lineage is maintaining a growth advantage over BA.2.38.

corneliusroemer commented 2 years ago

I'll reopen since numbers have increased by a factor of 50 since the issue was closed

FedeGueli commented 2 years ago

No sequence in the last two weeks.

Sinickle commented 2 years ago

Well, looks like this one might be basically dead at this point. The 3 spike mutations it gained that surpassed 1% prevalence but less than 5% were... S:346T, S:452M, S:460K. (not all on the same samples)

Edit: or I just got tricked by upload schedules

c19850727 commented 2 years ago

36 sequences popped out just now. 4 from Assam, 1 from Karnataka, and 31 from West Bengal. Dates of collection are between 2022-06-15 and 2022-08.

corneliusroemer commented 2 years ago

This was very successful in India for a short period - until BA.2.75 killed it.

Still worth a designation I think - but low priority as it's dead.

image
shay671 commented 2 years ago

Agree with the need for designation. We understand now how important it is to keep track of convergence.

InfrPopGen commented 2 years ago

Thanks for submitting. We've added lineage BA.2.38.4 with 293 newly designated sequences, and 5 updated designations from BA.2.38. Defining mutation A23636C (S:I692L) (following A22001G (S:K147E)).

FedeGueli commented 1 year ago

The main reason this genome stands out to me is the S:417T + S:444N combo, but having two additional spike mutations and having spread to 4 continents while likely being from an under-sequenced area makes this more worthwhile to designate than other larger S:417T + S:444N combo lineages, in my opinion.

Bloom labs believes S:444 is one of the most important sites to watch for mutations for, as a method of achieving immune escape. I claim that S:444N becomes more beneficial after S:417T is acquired.

The following tool does not show all sequences, but is hopefully sufficient for the argument.

Since Omicron with S:417T has never made up even 4% of cases for a time period, I believe these numbers argue that S:444N is more likely to succeed when S:417T is present.

This possibility makes it important to monitor a lineage that has this pair of mutations.

re reading this now it as really the first talk on convergent evolution in omicron rbd . great thought @Sinickle