cov-lineages / pango-designation

Repository for suggesting new lineages that should be added to the current scheme
Other
1.05k stars 98 forks source link

CP.8 sublineage with Spike:P812R and NSP3:P141S (Japan, Australia, Indonesia, Cambodia - 30 seq as of 2023-07-02) #1627

Closed thomasppeacock closed 1 year ago

thomasppeacock commented 1 year ago

Description

Sub-lineage of: BA.5.2.6 Earliest sequence: 2022-09-22 Most recent sequence: 2023-01-13 Countries circulating: Japan (11 seq), Australia (15 seq), Indonesia (1 seq), Cambodia (1 seq)

Proposing this due to the Spike mutation P812R - this is quite an unusual mutation that occasionally crops up in cell culture and (theoretically) introduces or optimises a polybasic cleavage site at the S2' site of SARS2 adjacent to the fusion peptide (in a manner akin to the furin cleavage site in the cell culture/egg-adapted avian coronavirus IBV strain Beudette - see figure below adapted from Bestle et al, LSA, 2020). As far as I'm aware this is the first time we've seen this mutation transmit widely in a SARS2 lineage.

image

Genomes: EPI_ISL_15893462 EPI_ISL_15349264 EPI_ISL_15894737 EPI_ISL_16581213 EPI_ISL_16736719 EPI_ISL_16194015 EPI_ISL_16692401 EPI_ISL_16717918 EPI_ISL_16720780 EPI_ISL_16807277 EPI_ISL_16809088 EPI_ISL_16842465 EPI_ISL_16253142 EPI_ISL_15693657 EPI_ISL_15894129 EPI_ISL_15923561 EPI_ISL_15923658 EPI_ISL_15967866 EPI_ISL_15472569 EPI_ISL_15578224 EPI_ISL_15673207 EPI_ISL_15673322 EPI_ISL_15938542 EPI_ISL_16004125 EPI_ISL_16077781 EPI_ISL_16077846 EPI_ISL_16191157 EPI_ISL_16447078

Phylogenetic tree: https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_43444_280bd0.json?f_userOrOld=uploaded%20sample image

Full credit to @FedeGueli and @ryhisner for spotting and monitoring this.

FedeGueli commented 1 year ago

Thank you @thomasppeacock great explanation!

corneliusroemer commented 1 year ago

Seems to have been seen in AY.4 before, in Denmark/UK/Ireland

image

Died a natural death pre-Omicron in Denmark:

image

https://cov-spectrum.org/explore/Denmark/AllSamples/AllTimes/variants?nextcladePangoLineage=AY.4*&aaMutations1=S%3AP812R&nextcladePangoLineage1=AY.4&analysisMode=CompareToBaseline&

oobb45729 commented 1 year ago

S:P812R is the rarest C to G mutation so it is not expected to happen very often. However, comparing to other C to G mutations, it does happen somewhat often. https://jbloomlab.github.io/SARS2-mut-fitness/S.html 812R

ryhisner commented 1 year ago

@corneliusroemer There was also a very interesting Delta sequence from Russia back in June 2022 with S:P812R. That one transmitted at least once as there was a very closely related sequence from a different patient. Only one of the two, however, had P812R, though it's possible both had it and only one showed up to due coverage issues. I've whited out the metadata showing these are different patients, but the private mutations of each are listed below. EPI_ISL_15137908, EPI_ISL_15137948

image

P812R was also in a B.1 sequence that popped up in South Carolina, USA, in February 2022, and which had a fascinating S2. EPI_ISL_12830215

image

Oddly, there was a synonymous mutation in the 3rd nucleotide of S:812 in the B.1. No idea how that could have had any function as it doesn't come anywhere near forming a TRS pattern and had to be synonymous whether it came before or after C23997G. Any ideas, @thomasppeacock?

image

At least two other extremely interesting, almost certainly chronic-infection-derived Delta sequences also had P812R, but I won't include the details here as this post is already too long.

In other news, S:T859X and S:D936X continue to crop up all over the place, particularly in highly mutated sequences, but I've yet to hear any ideas on what such mutations might be doing. The S:852-859 region seems to be a mutational hotspot, but I have no idea why.

ryhisner commented 1 year ago

One final note on S:P812R that is probably not relevant but which I want to throw out there just in case: C23997G creates a TRS-like motif—AAACGAAG—that is only one nucleotide off from the ideal AAACGAAC. Furthermore, two of the three preceding/upstream nucleotides—TCA—match the ideal TRS extended homology of TCT. Again, not sure if this iust relevant, but there are other convincing examples of TRS motifs occurring in nearby regions of spike—the 2-nuc mutation A852K is one example I've written about before on here.

image
corneliusroemer commented 1 year ago

Thanks Ryan! I think you confounded 859 and 821. There definitely is something going on. I have 853 and 883 in my head as well as common substitutions around there. Oh and 848 for some reason.

oobb45729 commented 1 year ago

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9642027/

This paper claims that T859N would interact with Q613 and G593, making up the D614-K854 salt bridge loss after D614G. If that's the case, the interaction need a residue with a side chain that is long enough and hydrophilic, so T859I or T859A probably won't have this type of effect.

ryhisner commented 1 year ago

@oobb45729, is there any chance that S:A852K is close enough to form a salt bridge with D614G? What about N856K, N856S, Q853K, or Q853R? Those last three pop up pretty regularly. There have also been five sequences with both the 2-nuc A852K and K854N, which is not something you'd expect to see given the relative rarity of both. I think that combo is very likely TRS related (K854N creating downstream extended homology), but maybe it has more than one purpose.

oobb45729 commented 1 year ago

I don't know about Q853, but N856 is actually near A570 and T572. Maybe N856X is more related to 570-574, another hotspot.

oobb45729 commented 1 year ago

Some further thoughts about those S2 mutations: This article is a good source about the conformational changes of the spike protein. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9221923/ However, there is still some missing links there, especially how the transition to the fusion intermediate state happen after the S2' cleavage and a more detailed structure of the FPPR in the fusion intermediate state. So I tried AlphaFold to predict the structure. The result surprised me. AlphaFold predicted the following structure with high confidency. FP The HR1 triple-helix extend until the WTF part at 886-888. Then the FPPR folds back, wrapping around the HR1 triple-helix. The FP is more flexible. It's just prediction from AlphaFold, so take it as a grain of salt, but could this structure be one of the intermediate state? If so, the structural change of the FPPR here would be important and K854 and T859 might play major roles. As for D936X, it is not far from the S2' cleavage site. My wild guess is that it may assist the S2' cleavage or assist the conformational change of the S2 unit after the S2' cleavage. Maybe after S2' cleavage F817 may form π–π Interactions with D936H/Y, L938F, S939F or S940F? @ryhisner

ryhisner commented 1 year ago

@oobb45729, thanks so much for that article! I'm always forgetting to check my notifications, so I only just now saw this, but this is exactly the sort of paper I've been looking for to learn more about the structure and conformational changes spike undergoes.

FedeGueli commented 1 year ago

33 as today. it keeps circualting at low levels in Japan

corneliusroemer commented 1 year ago

I added the parent lineage as CP.8 - the parent has quite a long branch (4 mutations) and quite a lot of Indonesian sequences so that seemed worthwhile on its own.

InfrPopGen commented 1 year ago

Thanks for submitting. We've added lineage CP.8.1 with 0 newly designated sequences, and 17 updated from CP.8, and 15 from BA.5.2.6. Defining mutations C3140T,C3619T,C23997G (ORF1a:P959S), (S:P812R) (following G6662A (ORF1a:V2133I), C18568A (ORF1b:L1701I)). Designated as a curiosity.