cov-lineages / pango-designation

Repository for suggesting new lineages that should be added to the current scheme
Other
1.04k stars 97 forks source link

GW.5.1.1 with Orf6a:61 restored/ Orf7a/ Orf7b/ Orf8 deleted by Ryan Hisner #2346

Closed FedeGueli closed 9 months ago

FedeGueli commented 11 months ago

I created a separated issue from #2277 @ryhisner superlative comment and analysis , Worth to be kept as an issue until confirmation of its findings. @corneliusroemer please don't close this one.
The issue of something weird happening has been raised first by @ryhisner himself here and @nkrmnzr that noticed a recurrent weird reversion of Orf6 stop codon , @oobb45729, who spotted first GW.5.1.1, did a first analysis then

Read: @ryhisner speaking: "I found a strange deletion in a couple sequences this morning, and when I looked for similar sequences, I was led to GW.5.1.1. I think it's possible that ORF7a, ORF7b, and ORF8 are all deleted in the S:F79S branch—something like ∆27395-28246. Furthermore, if this really is a deletion, as I suspect, it also creates an extra TRS for ORF9b/N. I consider it to be the 3rd TRS for ORF9b/N due to what I view as two overlapping TRS's already present. The sequence formed from the end of ORF6 to the start of N would be as pictured below. The BLAST and Nexclade alignments agree (though there are a number of sloppy sequences that seem to butcher everything).

image

Having a 3rd TRS for ORF9b/N would not be entirely unprecedented as Gamma had an AACA insertion in the ORF9b/N TRS that created three overlapping TRS motifs. .

image

.

I searched for all sequences with G25839T, G25906T, and C12473T and found 129. Almost all were total blanks in ORF7a, ORF7b, and ORF8 in Nextclade, and when I uploaded the NextClade alignment fasta to AliView, the NNN's very closely line up in most sequences, including sequences from numerous different countries, as seen below. (Almost all the NNN's left out of the picture because there are way too many to show.)

image

. The BLAST alignments are basically identical to the Nextclade ones. Below is an example. Query is a GW.5.1.1 sequence and Sbjct is the Wuhan reference genome.

image

. Maybe there's something entirely different going on here, but a deletion spanning approximately 27395-28246—where nearly every GW.5.1.1.1 has NNN's—is the simplest explanation I can think of. A consistent theme of SARS-CoV-2 evolution has been mutations that increase transcription of N/ORF9b, so the additional TRS for N/ORF9b fits that pattern. ORF8 has of course been almost a non-factor since the rise of BA.5, which had almost no ORF8 expression. The vast majority of XBB of course have ORF8:G8 while XBC.1 almost certainly have virtually no ORF8 expression due to a TRS-ablating mutation and ORF8:K2T, which can interfere with transcription.

Furthermore, large deletions, stop codons, and frameshifts leading to stop codons in both ORF7a and ORF7b have been relatively common in the Omicron era. My hypothesis is that ORF6 and ORF7a/ORF7b have redundant functions, so that as long as one is fully functioning, the other is disposable. Krogan Lab has done great work showing that ORF6:D61L severely reduces the ability of ORF6 to combat the interferon response, which it does by blocking the imports to and exports from the cell nucleus. https://www.biorxiv.org/content/10.1101/2022.10.18.512708v2

image

.

As long as ORF7a/b is fully functioning, ORF6:D61L seems tolerable. There are three mutations that more or less destroy the ORF7a TRS: C27389T, G27390T, and C27393T. These mutations have recurred throughout the pandemic, but they have been far less common in periods during which ORF6:D61L was predominant. Graphs below, which I've overlapped by making one semi-transparent, are from CovSpectrum. The scales of the y-axes are very different but the trends are apparent.

image

. It's more difficult to search for large deletions, but below was my attempt to compare the prevalence of large ORF7a deletions with ORF6:D61L. It doesn't catch frameshifting deletions smaller than 20 nt.

image

. For reasons that are totally unclear to me, ORF7a deletions and ORF7a-TRS-destroying mutations have been FAR more common in South Africa than elsewhere—something like 20 times more common. In the graph below, I combined the ORF7a-TRS destroyers with the large ORF7a deletions but for South Africa instead of globally. Note that unlike in the previous, global graph, the y-axes are aligned.

image

. Looking at the BLAST and Nextclade alignments, there seems to be no sign of ORF6:D61L. This would restore ORF6 to its previous level of potent innate immune evasion, which would, in turn, make both ORF7a and ORF7b disposable. So all in all, if this huge string of NNN's does turn out to be an enormous deletion, as I suspect, it makes sense to me. The renewal of ORF6 through eliminating ORF6:D61L make ORF7a redundant. ORF8 was already non-functional and just taking up space. And the additional TRS for ORF9b and N, both known potent innate immune antagonists, would both satisfy the virus's unquenchable thirst for more N/ORF9b transcription and possibly make ORF7a/ORF7b even more disposable to boot.

Originally posted by @ryhisner in https://github.com/cov-lineages/pango-designation/issues/2277#issuecomment-1776336071 "

FedeGueli commented 11 months ago

To add some info:

GW.5.1.1 is GW.5 (XBB.1.19.1 S:E554k, FLip , S:T478I) plus S:F79S, S:A475V, ORF3a:G172C

Tree: https://nextstrain.org/fetch/genome-test.gi.ucsc.edu/trash/ct/subtreeAuspice1_genome_test_48418_641d10.json?label=id:node_3205216 Schermata 2023-10-24 alle 09 54 30

Gisaid query: C450T, G25906T, G25839T

Samples ID: EPI_ISL_18112129, EPI_ISL_18122626, EPI_ISL_18168454, EPI_ISL_18222608, EPI_ISL_18227386, EPI_ISL_18242883, EPI_ISL_18242891-18242892, EPI_ISL_18246697, EPI_ISL_18248604, EPI_ISL_18263046, EPI_ISL_18264046, EPI_ISL_18273875, EPI_ISL_18281346, EPI_ISL_18281717, EPI_ISL_18294629, EPI_ISL_18294819, EPI_ISL_18294960, EPI_ISL_18300658, EPI_ISL_18301592, EPI_ISL_18301620, EPI_ISL_18301788, EPI_ISL_18302198, EPI_ISL_18302232, EPI_ISL_18302319, EPI_ISL_18302877, EPI_ISL_18313568, EPI_ISL_18324805, EPI_ISL_18324888, EPI_ISL_18326039, EPI_ISL_18326128, EPI_ISL_18332061, EPI_ISL_18332066, EPI_ISL_18332260, EPI_ISL_18332275, EPI_ISL_18344091, EPI_ISL_18350060, EPI_ISL_18352438, EPI_ISL_18355907, EPI_ISL_18356037, EPI_ISL_18358138, EPI_ISL_18361384, EPI_ISL_18362432, EPI_ISL_18363147, EPI_ISL_18366455, EPI_ISL_18366526, EPI_ISL_18366533, EPI_ISL_18366690, EPI_ISL_18366697, EPI_ISL_18366704, EPI_ISL_18367129, EPI_ISL_18367595, EPI_ISL_18373277, EPI_ISL_18373653, EPI_ISL_18373666, EPI_ISL_18375926, EPI_ISL_18378262, EPI_ISL_18378332, EPI_ISL_18378336, EPI_ISL_18392308-18392309, EPI_ISL_18392346, EPI_ISL_18392350, EPI_ISL_18392463, EPI_ISL_18392603, EPI_ISL_18392678, EPI_ISL_18392697, EPI_ISL_18392719, EPI_ISL_18392926, EPI_ISL_18393039, EPI_ISL_18393046, EPI_ISL_18398193, EPI_ISL_18398277, EPI_ISL_18398320, EPI_ISL_18398377, EPI_ISL_18398485, EPI_ISL_18398543, EPI_ISL_18398587, EPI_ISL_18398675, EPI_ISL_18398677, EPI_ISL_18398682-18398683, EPI_ISL_18398693, EPI_ISL_18403014, EPI_ISL_18405066, EPI_ISL_18406839, EPI_ISL_18407895, EPI_ISL_18411017, EPI_ISL_18411496, EPI_ISL_18411630, EPI_ISL_18411635, EPI_ISL_18411776, EPI_ISL_18411817, EPI_ISL_18412037, EPI_ISL_18414586, EPI_ISL_18415964, EPI_ISL_18420231, EPI_ISL_18420426, EPI_ISL_18420473, EPI_ISL_18420488, EPI_ISL_18420622, EPI_ISL_18420635, EPI_ISL_18420669, EPI_ISL_18420710, EPI_ISL_18420724, EPI_ISL_18420757, EPI_ISL_18420921, EPI_ISL_18420932, EPI_ISL_18420958, EPI_ISL_18420980-18420981, EPI_ISL_18421083, EPI_ISL_18421300, EPI_ISL_18421414, EPI_ISL_18421417, EPI_ISL_18421453, EPI_ISL_18421479, EPI_ISL_18421582, EPI_ISL_18421584, EPI_ISL_18421606, EPI_ISL_18421609, EPI_ISL_18421612, EPI_ISL_18421627

FedeGueli commented 11 months ago

The deletion/frameshift or completely change in accessory proteins Orf7a Orf7b Orf8 was already present in a lesser extent in a XBC.1.3 sublineage circulating with a 5-10% prevalence in Oceania (without spike mutations) in the last few months: https://github.com/sars-cov-2-variants/lineage-proposals/issues/542

ryhisner commented 11 months ago

I will try to add some more details this weekend, but I think there are several other branches that have large deletions in the ORF7a-ORF7b-ORF8 range. The exact details of where the deletions begin and end are, as Tom Peacock has told me, probably hidden. The NNN's in this branch of FW.1.1, however, looks identical to the one in GW.5.1.1, so it is likely a very similar large deletion.

GISAID query: C1909T, C7119T, C9442T

https://nextstrain.org/fetch/raw.githubusercontent.com/ryhisner/jsons/main/FW.1.1_C1909T_C7119T_C9442T__Big_Deletion.json?c=gt-nuc_9442&gmax=10442&gmin=8442&label=id:node_3554694

image

EPI_ISL_18325888, EPI_ISL_18332351, EPI_ISL_18355736, EPI_ISL_18355769, EPI_ISL_18356274, EPI_ISL_18386618, EPI_ISL_18398258, EPI_ISL_18398540, EPI_ISL_18421307, EPI_ISL_18432824

ryhisner commented 11 months ago

There is a branch of EG.5.1 with about 190 sequences that I believe has a large deletion spanning ORF7b-ORF8. It has C6543T and C28909T. The best sequences—the only ones that actually register the deletion (instead of having NNN's)—appear to be from Japan, but Nextclade is unable to read them. BLAST, however, has no problem, and it registers the following deletion: ∆27832-27885, ∆27896-28257. Like the posited GW.5.1.1 deletion, this deletion, if accurate, would create a third TRS for ORF9b/N.

GISAID Search Query: C6543T, A16878T, C2334T, C28909T

https://nextstrain.org/fetch/raw.githubusercontent.com/ryhisner/jsons/main/EG.5.1__ORF7b-ORF8_Deletion_C2334T_C6543T_A16878T_C28909T__192_seq___23_wrong_Usher_Tree.json?c=gt-nuc_6543&gmax=7543&gmin=5543&label=id:node_3461009

image
Genomes EPI_ISL_17969765, EPI_ISL_17970318, EPI_ISL_17970508, EPI_ISL_18047258, EPI_ISL_18047263, EPI_ISL_18047409, EPI_ISL_18047620, EPI_ISL_18058422, EPI_ISL_18067826, EPI_ISL_18068029, EPI_ISL_18068101, EPI_ISL_18071971, EPI_ISL_18109039, EPI_ISL_18116822, EPI_ISL_18117104, EPI_ISL_18117474, EPI_ISL_18117613, EPI_ISL_18117732, EPI_ISL_18117877, EPI_ISL_18117946, EPI_ISL_18124420, EPI_ISL_18137096, EPI_ISL_18138279, EPI_ISL_18139529, EPI_ISL_18139670, EPI_ISL_18139800, EPI_ISL_18139899, EPI_ISL_18140152, EPI_ISL_18140493, EPI_ISL_18145142, EPI_ISL_18161374, EPI_ISL_18164977, EPI_ISL_18165914, EPI_ISL_18165947, EPI_ISL_18166056, EPI_ISL_18168559, EPI_ISL_18208214, EPI_ISL_18208223, EPI_ISL_18208281, EPI_ISL_18209281, EPI_ISL_18209711, EPI_ISL_18213591, EPI_ISL_18213656, EPI_ISL_18213852, EPI_ISL_18213912, EPI_ISL_18214012, EPI_ISL_18214384, EPI_ISL_18218139, EPI_ISL_18218318, EPI_ISL_18227051, EPI_ISL_18235752, EPI_ISL_18235855, EPI_ISL_18235892, EPI_ISL_18235967, EPI_ISL_18236941, EPI_ISL_18237037, EPI_ISL_18239483, EPI_ISL_18243129, EPI_ISL_18243391, EPI_ISL_18243400, EPI_ISL_18243445, EPI_ISL_18243462, EPI_ISL_18246641, EPI_ISL_18247763, EPI_ISL_18254845, EPI_ISL_18254962, EPI_ISL_18259587, EPI_ISL_18260446, EPI_ISL_18261393, EPI_ISL_18271990, EPI_ISL_18272273, EPI_ISL_18279079, EPI_ISL_18280059, EPI_ISL_18281390, EPI_ISL_18284849, EPI_ISL_18285198, EPI_ISL_18285348, EPI_ISL_18285410, EPI_ISL_18292998, EPI_ISL_18293011, EPI_ISL_18293036, EPI_ISL_18293096, EPI_ISL_18293111, EPI_ISL_18293124, EPI_ISL_18293202, EPI_ISL_18293362, EPI_ISL_18293494, EPI_ISL_18293535, EPI_ISL_18293572, EPI_ISL_18293575, EPI_ISL_18293704, EPI_ISL_18293740, EPI_ISL_18294156, EPI_ISL_18300341, EPI_ISL_18300765, EPI_ISL_18305354, EPI_ISL_18306066, EPI_ISL_18306487, EPI_ISL_18306524, EPI_ISL_18307175, EPI_ISL_18307192, EPI_ISL_18307291, EPI_ISL_18307305, EPI_ISL_18307307, EPI_ISL_18312392, EPI_ISL_18312430, EPI_ISL_18312516, EPI_ISL_18312578, EPI_ISL_18312779, EPI_ISL_18312840, EPI_ISL_18312866, EPI_ISL_18313325, EPI_ISL_18313558, EPI_ISL_18315526, EPI_ISL_18324578, EPI_ISL_18327728, EPI_ISL_18327764, EPI_ISL_18330993, EPI_ISL_18330999, EPI_ISL_18331041, EPI_ISL_18332458, EPI_ISL_18342031, EPI_ISL_18346697, EPI_ISL_18346840, EPI_ISL_18346911, EPI_ISL_18347779, EPI_ISL_18350065, EPI_ISL_18353958, EPI_ISL_18354013, EPI_ISL_18354317, EPI_ISL_18354602, EPI_ISL_18354652, EPI_ISL_18354746, EPI_ISL_18355140, EPI_ISL_18355147, EPI_ISL_18355153, EPI_ISL_18355156-18355157, EPI_ISL_18359797, EPI_ISL_18359858, EPI_ISL_18361083, EPI_ISL_18361891, EPI_ISL_18362992, EPI_ISL_18370765, EPI_ISL_18370818, EPI_ISL_18370964, EPI_ISL_18371329, EPI_ISL_18372666, EPI_ISL_18376281, EPI_ISL_18376406, EPI_ISL_18376973-18376974, EPI_ISL_18377545, EPI_ISL_18381936, EPI_ISL_18382641, EPI_ISL_18382676, EPI_ISL_18384708, EPI_ISL_18385688, EPI_ISL_18386752, EPI_ISL_18387158, EPI_ISL_18390797, EPI_ISL_18392741, EPI_ISL_18402227, EPI_ISL_18402307, EPI_ISL_18402549, EPI_ISL_18402590, EPI_ISL_18403302, EPI_ISL_18408832, EPI_ISL_18409076, EPI_ISL_18410327, EPI_ISL_18411769, EPI_ISL_18413614, EPI_ISL_18415356, EPI_ISL_18415363, EPI_ISL_18417943, EPI_ISL_18418070, EPI_ISL_18418092, EPI_ISL_18419599, EPI_ISL_18426189, EPI_ISL_18428829, EPI_ISL_18428832-18428833, EPI_ISL_18430423, EPI_ISL_18432559, EPI_ISL_18432648, EPI_ISL_18432691, EPI_ISL_18432702, EPI_ISL_18432983, EPI_ISL_18438906, EPI_ISL_18438930, EPI_ISL_18439456, EPI_ISL_18439510
FedeGueli commented 9 months ago

Designated KE.3 via https://github.com/cov-lineages/pango-designation/commit/8ee41ec6ebfae567b3ab17453d5c6605cff0f95c