cov-lineages / pango-designation

Repository for suggesting new lineages that should be added to the current scheme
Other
1.04k stars 97 forks source link

Potential BA.2/BA.1* Recombinant with Likely Breakpoint at ORF3a/M Protein (21 Seqs in Spain-Balearic Islands as of 2022-04-01) #482

Closed c19850727 closed 2 years ago

c19850727 commented 2 years ago

Description

Recombinant between: BA.2 & BA.1 Earliest sequence: 2022/2/25 (Spain-Balearic Islands) Most recent sequence: 2022/3/11 (Spain-Balearic Islands) Countries circulating: Spain Likely breakpoint: between 26062 and 26528 (from ORF3a to M). Conserved Nuc mutations (those in red frames are likely from the donor from the BA.1 side): image (G28883C and C29750T are truncated in above picture.)

Cov-spectrum query: C4321T, A9424G, 29510A, G14653T, C25844T GISAID query: NSP12_V405F, NSP1_S135R, NS3_T151I, -N_S413R

Evidence

Usher tree: image https://nextstrain.org/fetch/genome-euro.ucsc.edu/trash/ct/subtreeAuspice1_genome_euro_6f6c_a81150.json?c=gt-nuc_25844&label=nuc%20mutations:C29510A

Genomes:

EPI_ISL_10737475, EPI_ISL_10737517, EPI_ISL_10737527, EPI_ISL_10990917, EPI_ISL_10990918, EPI_ISL_10990940, EPI_ISL_10990949, EPI_ISL_10990960, EPI_ISL_10990968, EPI_ISL_10990978, EPI_ISL_11218531, EPI_ISL_11218599, EPI_ISL_11218610

AngieHinrichs commented 2 years ago

@c19850727 what do you think of all the Scotland sequences on the parent node in the UCSC/UShER tree (e.g. Scotland/QEUH-39E4A5D/2022, Scotland/QEUH-3867D5A/2022), just before the private mutations on G14653T and C25844T? I think those would have the same breakpoint region, just not the private mutations.

c19850727 commented 2 years ago

@AngieHinrichs Yes exactly. I've actually discussed this with @thomasppeacock and we both believe those sequences from Scotland are likely legit recombinants with similar breakpoint region as #482. But as you said, there's no private mutations to distinguish them from others in the tree.

I am keeping an eye on them to see if there's any growth.

c19850727 commented 2 years ago

@AngieHinrichs It seems that Usher is puttting these sequences in a completely different tree than a few days ago?

image https://nextstrain.org/fetch/genome-euro.ucsc.edu/trash/ct/subtreeAuspice1_genome_euro_65af_2c0050.json?c=userOrOld&label=nuc%20mutations:C27382G,T27383A,C27384T

AngieHinrichs commented 2 years ago

@c19850727 Yes, the tree update process includes an optimization step (after adding new sequences and doing some branch-specific masking) and that can cause branches to hop around the tree a bit arbitrarily. In addition to that, last night I pruned a bunch of sequences from the 2022-03-26 tree that were causing some awkward splits in BA.1 and BA.2 [BA.1.17 is especially prone to being split into BA.1 > G5924A and BA.1 > C23664T > G5924A during optimization; BA.2 has a problem with C10198T > C2790T vs. C2790T > C22792T > C10198T]. The 2022-03-28 tree should be ready later today and initially won't have had the same pruning, but I plan to apply it again. Then the pruned sequences will be added back in the update that will begin later today (2022-03-29) and that will probably take a couple days to complete.

The sequences that I pruned last night unfortunately included the sequences from this issue and #478 as well as their immediate neighbors. So if you upload those sequences now, their previous perch has been temporarily pruned from the tree and you get a different placement.

Please pardon the dust -- constantly under construction. :)

P.S. The placement of recombinant lineages in a tree should be interpreted with extra caution because recombination violates the assumptions on which a phylogenetic tree is built (i.e. an unbroken sequence that accumulates different mutations over time on different lineages). We can expect recombinant lineages that have similar ancestral lineages and breakpoints to be placed near each other in the tree, but despite being placed close to each other, they probably don't share very recent ancestors like we would normally assume for sequences placed closely on the tree.

c19850727 commented 2 years ago

Thanks @AngieHinrichs !

And also, regarding those neighboring sequences from Scotland, it seems they're growing, and starting to have their own private mutations. image https://nextstrain.org/fetch/genome-euro.ucsc.edu/trash/ct/subtreeAuspice1_genome_euro_66e4_41c1f0.json?c=gt-nuc_1250,14653,28374&label=nuc%20mutations:T26858C

c19850727 commented 2 years ago

21 sequences now, and the latest one was sampled in Madrid.

chrisruis commented 2 years ago

Thanks @c19850727 It looks like there's now 27 sequences - 26 from Balearic Islands and the 1 from Madrid. The circulation in Balearic Islands is interesting and while I don't think we can look at this region specifically on covSpectrum, I've downloaded all Balearic Islands sequences from GISAID and it looks like this recombinant is ~4.5% of sequences from 20th February to 10th April. It's quite borderline for designation, so I'll add a monitor label for now and let's check back soon and see if there's more genomes

chrisruis commented 2 years ago

Thanks again @c19850727 It doesn't look like there's been any new sequences collected since 8th April 2022 so will close this for now