cov-lineages / pango-designation

Repository for suggesting new lineages that should be added to the current scheme
Other
1.04k stars 97 forks source link

2 small South African clusters of Omicron/Delta recombinants with interesting Spike mutations (8 sequences) #844

Closed JosieLikesCats closed 2 years ago

JosieLikesCats commented 2 years ago

Hi everyone, I'm just opening this issue to highlight that there are several sequences with an unusual mutation pattern in our most recent upload from South Africa, which will potentially represent two new lineages if more sequences are detected. The teams in our genomic surveillance network (NGS-SA) as well as our public health institute (NICD) are closely monitoring the sequences and cases in the country. These new constellations have been detected only in a small proportion of recent data, and our cases remain low.

I know these do not yet meet requirements for designation, as there are only N=4 and N=3 (2 available on GISAID, last 1 will be released tomorrow) sequences for each constellation, but we thought they would probably be of interest and picked up here/on Twitter eventually. For now, please see below for some details and the major mutation profiles for the two groups of sequences.

N=4 constellation 1 Earliest sequence: 28 June 2022 Most recent sequence: 29 June 2022 Circulating: Gauteng, South Africa Nextclade assigns 21M but flags lots of private mutations (mainly 21J), pango assigns Unassigned/B.1.1.529

Genomes EPI_ISL_13830378 EPI_ISL_13830377 EPI_ISL_13830376 EPI_ISL_13830375

N=3 constellation 2 Earliest sequence: 13 June 2022 Most recent sequence: 24 June 2022 Circulating: Limpopo, South Africa Nextclade assigns 21J but flags lots of private mutations (mainly 21K/21L), pango assigns XD

Genomes EPI_ISL_13830379 EPI_ISL_13830380

Evidence constellation1_defining_aa_changes.xlsx constellation2_defining_aa_changes.xlsx

Spike mutations in constellation 1 only, relative to Omicron: R21G, F486P, P621S, A706V Spike mutations in constellation 2 only, relative to Omicron: S477D Shared mutations relative to Omicron BA.4/5: L18F, T19R, W152L, E156del, F157del, R158G, F186L, G446D, T1117I Notably both clusters have a second silent nt change in L452R not present in BA.4/5. There are some significant differences outside spike (see attached mutation profiles). The sites 213, 371, 373, 375, 376, 408, and 764 are not reliably covered by the data, so they cannot be confirmed yet. UShER tree (including 7th sequence to be uploaded): https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/singleSubtreeAuspice_genome_1ef08_5f410.json (in a previous Usher tree they clustered near XD).

corneliusroemer commented 2 years ago

Thanks @JosieLikesCats for flagging these at such an early stage, they are indeed very interesting sequences.

I'm sorry, I can't really provide much insights yet, but I would very much appreciate if you could try to answer some questions that come to mind: Do you happen to have raw reads for these - this would be very helpful to validate that the sequences are real and not for example due to coinfection. Are there other possible explanations for these? Do you know how closely related the individuals are who were sequenced?

Where in Limpopo and Gauteng are these from? Google maps says ~400km but could the Limpopo ones also be from a place closer to Gauteng province?

These sequences should be run through sc2rf to check for recombination - first impression is that it's very recombinanty. But also that both sequences seem to have some things in common. Very speculatively, these could represent different results of intra host recombination - but one needs to look at this in more detail.

It may be worth splitting this issue up and separate the proposals but for now to keep things simple I think it's fine to keep them both together.

Here are screenshots from Nextclade to save everyone some time:

image image
silcn commented 2 years ago

Just spotted these too, went to check and found this issue. These look like some weird combination of three different lineages: AY.45, a divergent BA.4/5, and BA.2 (seen near the 3' end where BA.4/5 mutations are absent from the Omicron-derived sections). Both have lots of apparent breakpoints but the 3' end of constellation 2 is particularly messy and seems to switch back and forth almost every mutation - the N protein alone goes Omicron/Delta/Omicron/Delta.

If these are real I wouldn't be surprised if they are new emergences from the Omicron source, especially given the location.

JosetteSchoenma commented 2 years ago

Hi, I have run them through Sc2rf. It looks mostly like a Delta/BA.5 recombinant to me. But with several breaking points and 11 shared private mutations between the 2 clusters. The graph on the bottom is best. Green being BA.2 and red being Delta mutations. It starts with some example BA.4 and BA.5 samples. I included the first graph, where I selected the BA.1 and BA.2 clade, because it shows 6 South-African samples, where the bottom one somehow loses 2 and shows just 4. By the way, the C21614T is a mutation these samples share with Gamma. image

corneliusroemer commented 2 years ago

The small cluster (no 2) very much looks like a 21J and BA.4/5 recombinant to me - but with an unusually high number of 5 breakpoints. The ranges are as follows:

0-2790: BA.4/5 [length ~ 3-4k]
2791-4180: BA.4/5 -> 21J (BP1)
4181-21846: 21J [length 18-19k]
21847-21986: 21J -> BA.2 (BP2)
21987-24912: BA.4/5 [length 3k]
24913-24999: BA.4/5 -> 21J (BP3)
25000-28310: 21J [length 3-4k]
28271-28880: 21J -> BA.2 (BP4)
28881-28916: BA.4/5 [length 40-1.2k]
28917-29401: BA.4/5 -> 21J (BP5)
29402-end     : 21J [length 500-1k]
JosetteSchoenma commented 2 years ago

Here is the Usher tree: Screenshot_20220715-001558_Twitter.jpg

silcn commented 2 years ago

Here is a spreadsheet comparing the mutations to AY.45, BA.2, BA.4 and BA.5: pango844.xlsx Mutations are included if they appear in at least one of the sequences.

Constellation 1 has at least 6 breakpoints; constellation 2 has at least 5 as @corneliusroemer says. Could potentially be even more, as there are a couple of places where a single mutation or reversion could have been gained convergently or through recombination.

I'm also unconvinced that BA.4/5 are involved - I think the Omicron parent may just be a divergent BA.2. Looking at all of the locations where BA.2 and BA.5 differ, referring to the constellations as C1 and C2 for short: 12160: C1 looks like BA.2, C2 is Delta 21765-21770 (S:69/70): both Delta 22917 (S:452): both look like BA.5 23018 (S:486): C1 is Omicron-derived but has different nucleotide from BA.2 and BA.5; C2 looks like BA.2 23040 (S:493): both look like BA.5 26529 (M:3): both Delta 26858, 27259, 27382-4: C1 could be Delta or BA.5, C2 is Delta 27889: C1 looks like BA.2, C2 is Delta

If BA.5 is involved and not BA.2, we have to believe that the silent mutations at 12160 and 27889 both reverted, as well as S:F486V in C2. In my humble opinion, this seems much less likely than S:L452R and S:R493Q arising independently on top of BA.2 - after all, we've seen that in BA.2.77 too.

The evidence against BA.4 is even stronger, e.g. neither constellation has the deletion in nsp1.

AngieHinrichs commented 2 years ago

Thanks @JosetteSchoenma -- here is a link to the UShER view with a permanently saved .json file that won't be deleted in a couple days, and with branches labeled by reversions/back-mutations: https://nextstrain.org/fetch/hgwdev.gi.ucsc.edu/~angie/pango-designation-844.json?branchLabel=back-mutations&c=pango_lineage_usher&label=nuc%20mutations:T19955C

Since this is apparently a recombinant, UShER is not as useful as it might be otherwise. The phylogenetic tree assumes a steady accumulation of mutations, but recombinants violate that assumption. UShER places a recombinant sequence on the branch of the tree where it has the fewest differences from existing sequences, which usually corresponds to one of the parent lineages -- but there are reversions/back-mutations for the portions of the genome contributed by the other parent. (In fact, a long branch with multiple reversions is the signal that our RIPPLES tool uses to look for potential recombinants in the big tree... we should run that again one of these days!)

These sequences are placed on a branch of BA.5 that is already riddled with various reversions that are probably mostly sequencing artifacts -- but the long branch makes it pretty clear that these sequences are different from the others, and that placement is just the best that usher could do given the circumstances, not necessarily an indication that the sequences on that subtree are closely related.

Sinickle commented 2 years ago

I agree with @silcn's analysis and reasoning.

One small thought to add - Regarding the silent mutation C22916A at S:452. If the Omicron contributor is BA.2, then that mutation would create S:L452M, which we have seen on other successful BA.2 lineages. Possibly this was an intermediate step, before gaining S:M452R.

UnusualTimes commented 2 years ago

Could this be an XT (or similar) + BA.5 recombination ?

FedeGueli commented 2 years ago

@silcn there is also a little branch of BA.5 with 27889C of WT proposed in #797. maybe acquired via recomb or ancestral who knows.

FedeGueli commented 2 years ago

Possible CvSp query to catch both constellation 1 and 2:

https://cov-spectrum.org/explore/World/AllSamples/AllTimes/variants?aaMutations=M%3A146H&nucMutations=8595T%2C15026T&

Edit it catches 7/7 sequences on Covspectrum

based on the great list by @silcn

JosieLikesCats commented 2 years ago

Thanks for all analysis so far, very interesting to read all the comments! The final sequence has been uploaded to GISAID, so it should be available soon - I'll edit this comment once it's released. EDIT: sequence now released, EPI_ISL_13843609

@corneliusroemer to answer some of your questions: I'm currently analysing the raw data more closely, and will update the sequences if any of the sites turn out to be incorrect/poorly supported. I've also got some other people helping me look at them, and we can potentially share the read counts per site etc. once we're done. We do currently have the samples in the pipeline to be resequenced to confirm some of the sites as well.

The sequences are from Johannesburg (Gauteng) and Polokwane (Limpopo), which are two of the main cities in each province. We have also had school and university holidays recently, and so there has likely been increased travel between provinces.

I see recombination is being looked at quite closely by everyone, so I'll just add that we have some NGS-SA team members also taking a look with a variety of tools; we'll update accordingly if we find anything interesting.

Thanks for adding the screenshots! I had considered two separate issues but thought for now since there are so few it made sense to keep it together. Happy to split these in future if needed.

FedeGueli commented 2 years ago

NGS-SA report on these sequences: https://t.co/smHLpdRsF7

FedeGueli commented 2 years ago

I can confirm this query catches 7 out of 7 sequences of this new variant: https://cov-spectrum.org/explore/World/AllSamples/AllTimes/variants?aaMutations=M%3A146H&nucMutations=8595T%2C15026T&

ryhisner commented 2 years ago

There's a new sequence from Gauteng uploaded today clearly related to the others, though it's 29.7% NNN's—EPI_ISL_13913050

It has S:F486P and S:P621S, so it's part of constellation 1, but spike residues 1-340 and 670-1044 are blank according to NextClade. Unlike the other four sequences from constellation 1, this one has S:T572I. Collection date 2022-7-4.

Here's the Usher tree: https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_2267d_6f9c40.json?c=pango_lineage&label=nuc%20mutations:C7124T,C8595T,C8986T,G9053T,C15026T,G18163A,C21614T,C21618G,T21762C,G22017T,T22118C,G22899A,C22916A,T22917G,G23040A,C24912T,C25413T,G26959A,C27259A

AngieHinrichs commented 2 years ago

@ryhisner thanks for adding the EPI_ISL ID there, was just about to ask, and sorry the UShER web interface is using an old fasta-reading library that truncates names at the space character... "hCoV-19/South" is a pretty useless label.

AngieHinrichs commented 2 years ago

EPI_ISL_13913050 (SouthAfrica/CERI-KRISP-K045132/2022) is the first of these from outside NICD, and it is more similar to C1 than it looks in that UShER tree view -- it has Ns at 4456, 5869, 10198, 12163, 21623, and 23679, and it has C10954T and C28531T like NICD-N47701 and NICD-N47705, so I would expect UShER to place it at the end of that branch, instead of splitting it in the middle. Looking into why it didn't.

Meanwhile here's a sc2rf view so you can see how CERI-KRISP-K045132 looks pretty much like the NICD C1 sequences, but with more Ns, and without reversions at 22686, 22688 and 22786 (common casualties of amplicon dropout I think):

image
thomasppeacock commented 2 years ago

Although these clusters don't fulfill the minimum number of sequences - I think the extremely unique potential pattern of recombination might mean if sequences keep appearing they should get assigned as there will be justification for being able to refer to them by a non-ambiguous designation. Going to put a monitoring tag on this for now (hope thats okay @chrisruis @InfrPopGen !).

corneliusroemer commented 2 years ago

Agree that it's worth seeing if any new sequences in this cluster appear and if they do to designate.

The minimum number of sequences is not a hard limit, we can make exceptions if there are good reasons (there are here).

JosieLikesCats commented 2 years ago

Hi everyone, just a heads-up that one more sequence from constellation 2 will be released in the next couple of hours (N46078, EPI_ISL_14112354). Also from Limpopo, with collection date of 30 May. We haven't yet detected any more recent samples but are monitoring closely.

silcn commented 2 years ago

The new constellation 2 sequence is missing M:R146H and so will not be picked up by @FedeGueli's cov-spectrum query. Here is a query that will pick up everything once cov-spectrum is updated with the new sequence: https://cov-spectrum.org/explore/South%20Africa/AllSamples/AllTimes/variants?variantQuery=%5B5-of%3A+8595T%2C+15026T%2C+21614T%2C+22118C%2C+22899A%2C+22916A%2C+24912T%2C+26959A%5D&

FedeGueli commented 2 years ago

thx @silcn checked your new query 9 out of 9! well done. https://cov-spectrum.org/explore/South%20Africa/AllSamples/AllTimes/variants?variantQuery=%5B5-of%3A+8595T%2C+15026T%2C+21614T%2C+22118C%2C+22899A%2C+22916A%2C+24912T%2C+26959A%5D&

silcn commented 2 years ago

2 more sequences from Constellation 1: EPI_ISL_14585888, EPI_ISL_14585891 Both from Western Cape, South Africa, sampled 2022-08-08.

corneliusroemer commented 2 years ago

Here's the Usher tree with the two new sequences - I think it'd be worth designating the bigger cluster of the two as it seems to continue circulating and has a very intriguing spike profile.

image

https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_17d5d_648160.json?c=Nextstrain_clade_usher&label=nuc%20mutations:C12747T,C12880T,T23018C,T23019C,C23423T,C26681T,C27575G,A29510C

aq-sun commented 2 years ago

Thanks for submitting, we've designated this recombinant lineage as XAY with 11 new designations.

corneliusroemer commented 2 years ago

@aq-sun can you confirm whether you designated one or both lineages? The proposal was for two separate clusters - did you merge them into one or designate only one of the two? If only one was merged we should probably keep the issue open for the other cluster?

silcn commented 2 years ago

Looks like XAY encompasses both clusters. Possibly not strictly following the Pango rules, but there clearly is shared ancestry between the two clusters even if it's impossible to say what the "common ancestor" is. In my opinion the larger cluster should immediately be designated XAY.1, and if the smaller cluster reaches 5 sequences then it should be XAY.2.

aq-sun commented 2 years ago

This seems like the appropriate thing to do - I'll reopen the issue and monitor for further development in constellation 2, and designate the larger constellation as XAY.1.

corneliusroemer commented 2 years ago

Great! I wouldn't make these sublineages of each other as there may be recombination involved. I'd just call this XAV, and the next one XAW or whatever :)

On Fri, Aug 26, 2022, 13:16 Angela Sun @.***> wrote:

This seems like the appropriate thing to do - I'll reopen the issue and monitor for further development in constellation 2, and designate the larger constellation as XAY.1.

— Reply to this email directly, view it on GitHub https://github.com/cov-lineages/pango-designation/issues/844#issuecomment-1228368626, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF77AQIHR6CUTU7Y4OEUUVLV3CRQLANCNFSM53TGFWRA . You are receiving this because you were mentioned.Message ID: @.***>

aq-sun commented 2 years ago

I'll leave the first cluster as XAY then!

FedeGueli commented 2 years ago

One more sequence i think popped up:

EPI_ISL_14728611

Gauteng

i think it is XAY (Constellation 1 ) :S:A706V,F486P, P621S, R21G,

aq-sun commented 2 years ago

Added this to XAY.

FedeGueli commented 2 years ago

Gisaid query M_R146H, Spike_F186L for XAY

AngieHinrichs commented 2 years ago

Gisaid query M_R146H, Spike_F186L for XAY

@FedeGueli I think that query covers both C1 (designated XAY) and C2 (monitored).

Extending your query for XAY: M_R146H, Spike_F186L, Spike_F486P

for C2: M_R146H, Spike_F186L, N_D63G

FedeGueli commented 2 years ago

Thx @AngieHinrichs !! yes i usually do the S:F486P and then the other! good to have them separately thank you very much

InfrPopGen commented 2 years ago

Lineage XBA has been designated for constellation 2, with four example sequences. The lineage alias is given as an interim AY.45/BA.2 recombinant, with one breakpoint, because that at least gives pipelines what they expect when reading the json.

FedeGueli commented 1 year ago

New XAY just uploaded an elderly man in Cape Town collected on 31/08/22.

EPI_ISL_14975893

corneliusroemer commented 1 year ago

The first international XAY has just appeared in Denmark:

hCoV-19/Denmark/DCGC-585187/2022|EPI_ISL_15155569|2022-09-23

Travel information is not available, but we know this is a reinfection with last infection in January, possibly BA.2 given this was Denmark.

Additional host information: n_infections=2,last_infection_date=2022-01-24
image

https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_16aa0_445360.json?c=pango_lineage_usher&label=nuc%20mutations:G15451A,C16466T,C19220T,A22688G,A22786C,T25584C,T26270C,G26577C,A26709G,T26767C,G29742T

JosetteSchoenma commented 1 year ago

There is another XAY from Denmark. Getting the same place on the Usher tree as the one @corneliusroemer mentioned. 2nd of October 2022. EPI_ISL_15284246.

FedeGueli commented 1 year ago

Two XAY more sequences from south africa have been uploaded collected on 29/8 and 14/9 both from Gauteng and baseline surveillance. EPI_ISL_15259248, EPI_ISL_15259343

FedeGueli commented 1 year ago

i think i have found a better query for XAY : Spike_P621S,Spike_F186L , it actually founds 27 sequences from 3 countries while Gisaid Pangolin calls 23 viruses XAY and our old manual query a lot less.

@AngieHinrichs

AngieHinrichs commented 1 year ago

Thanks @FedeGueli, that helped me find a couple new CA sequences that were being excluded from the tree but should be added tomorrow!

https://nextstrain.org/fetch/hgwdev.gi.ucsc.edu/~angie/XAY.2022-10-13.json?branchLabel=Spike%20mutations&c=pango_lineage_usher&label=nuc%20mutations:G4184A,C4321T,G10447A,C12747T,C12880T,T23018C,T23019C,C23423T,C26681T,C27575G,A29510C,T29742G&s=hCoV-19/USA/CA-FG-296207/2022%7CEPI_ISL_15324975%7C2022-09-30,hCoV-19/USA/CA-FG-296179/2022%7CEPI_ISL_15324972%7C2022-09-30

Also missing but hopefully added tomorrow: SouthAfrica/SU-NHLS_5859/2022|EPI_ISL_14975893|2022-08-31

FedeGueli commented 1 year ago

Found a sequence that Nextclade sees as XBA but Usher puts outside every branch starting directly from the B.1.1.529 root, bu it is mixed between Delta and Omicron.

It is from Belgium and sampled recently: EPI_ISL_15537619

@corneliusroemer @thomasppeacock @AngieHinrichs @JosieLikesCats @JosetteSchoenma @c19850727 @silcn @shay671

https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_274cf_c03220.json?c=pango_lineage_usher&s=hCoV-19/Belgium/UZB_548202210266/2022%7CEPI_ISL_15537619.%7C2022-10-24

Schermata 2022-11-09 alle 20 49 17 Schermata 2022-11-09 alle 20 51 14

AngieHinrichs commented 1 year ago

Command-line nextclade places it with XBA as the closest match... but with 21 reversions relative to the XBA placement, as well as 7 mutations associated with other clades, and 28 additional mutations. It's excluded from the UShER tree because it's Omicron-ish but so divergent from its nextclade placement. My guess is contamination, but that's just my guess based on looking at nextclade numbers; someone looking at the raw data might see something else.

oobb45729 commented 1 year ago

It looks like another recombinant strain that is related to XAY/XBA to me. The reversions can be easily explained by different breakpoints. This one is an XBA-like with a Delta-like S2. The S2 part (P681R+V736I+T859N+D950N) looks pretty real. T859N is one of the most notable convergent mutations in the late Delta era.

oobb45729 commented 1 year ago

This one also may give some hints about how XAY/XBA evolved. This one has L452M, elaborating the L452M->R theory.

oobb45729 commented 1 year ago

No, this one is unrelated to XAY/XBA. Orf1b:M115I and C25413T from AY.45 is missing. It is another Omicron/Delta recombinant that is strikingly similar to XBA.

oobb45729 commented 1 year ago

A breaking point between S:EFR156G and S:V213G like XAY/XBA, which is also close to XAW and XBC breaking points. Also like XAY/XBA/XBC, another breaking point somewhere in nsp1-nsp3 to get BA.2's orf1a:S135R. And like XBA, a breaking point between N:D63G and N:R203K. S:L18F+T19R+R21G like XAY and more intriguingly, Orf7a:T61S like XAY but with A27574T, not XAY's C27575G (both mutations are not common so far).