Closed oobb45729 closed 1 year ago
Query: C450T, G25906T, G25839T
it is a quinquesmutant in the RBD having 455-456-475-478-554 mutated . cc @corneliusroemer
Yes I've seen this before but was far too small then. Let's wait for a few more genomes.
There is a sequence from Pakistan that belongs in the smaller sublineage. The coverage is horrendous, but it has C450T, T21798C, G25839T, and G25906T. Pakistan or a nearby area seems a very likely place of origin for this. EPI_ISL_18122626
Also, while I'd normally dismiss changes to ORF6:D61L as artifacts, there may be something going on here. The Pakistan sequence is a total blank, but two sequences from England, one from Italy, and one from California (USA) all show something strange happening there. They could all four be artifacts, but you don't usually see artifacts from completely different labs like that unless there's something unusual in that part of the genome.
The ORF3a:E239G/ORF1a:E1815V branch is also really interesting and worth watching. It has N:A90S/ORF9b:E86D, which is homoplasic and I think likely advantageous, and its two major branches have either S:G184S or S:D215H. (The sequence from California has terrible coverage but has an "unknown AA" at N:90/ORF9b:86, so it also has that mutation.)
And with sequences from England, Denmark, Spain, California (USA), and Michigan (USA), it's geographic spread is striking. I think all these branches originated in Pakistan or a nearby region and are only now beginning to spread across the globe.
Yes, definitely worth designating once we have enough sequences for the placement to be stable
the S:A475V branch: Pakistan +3 now Usher shows the total is 8 (Pakistan 3, UK 3, Italy 1, US 1) GW.5 with ORF3a:W149C is also partially mentioned in https://github.com/sars-cov-2-variants/lineage-proposals/issues/640
That recurring G27382C, T27383A_reversion, C27384T_reversion, ins_27384C(ORF6:D61H)
is something noticed sars-cov-2-variants/lineage-proposals#560 , no idea what happened out there, or some kind of artifact just discovered?
BTW it is causing havoc in the subtree (downstream ORF6 frameshift, messing the start codon of ORF7a etc)
That recurring
G27382C, T27383A_reversion, C27384T_reversion, ins_27384C(ORF6:D61H)
is something noticed sars-cov-2-variants/lineage-proposals#560 , no idea what happened out there, or some kind of artifact just discovered?BTW it is causing havoc in the subtree (downstream ORF6 frameshift, messing the start codon of ORF7a etc)
I think I figured out what happened there. It's an insertion of 'A' between 27382 and 27383.
GAT->CTC->CATC
CoV-Spectrum interprets it as G27382C and ins_27384:C.
It causes the loss of the ORF6's stop codon as well.
T27395A(ORF7a:M1K) actually causes an AA change in the extended ORF6.
@ryhisner @NkRMnZr
That recurring
G27382C, T27383A_reversion, C27384T_reversion, ins_27384C(ORF6:D61H)
is something noticed sars-cov-2-variants/lineage-proposals#560 , no idea what happened out there, or some kind of artifact just discovered? BTW it is causing havoc in the subtree (downstream ORF6 frameshift, messing the start codon of ORF7a etc)I think I figured out what happened there. It's an insertion of 'A' between 27382 and 27383.
GAT->CTC->CATC
CoV-Spectrum interprets it as G27382C and ins_27384:C.
It causes the loss of the ORF6's stop codon as well.
T27395A(ORF7a:M1K) actually causes an AA change in the extended ORF6.
@ryhisner @NkRMnZr
Yes i think also @NkRMnZr figured out the insertion thing. But so it would create a new Orf6/7a protein? Just checking where the first stop codon comes then. @ryhisner suggested it could be an artifact.
The next stop codon TGA should be between orf7a:15-16
The next start codon ATG is the one of Orf7b but it is not in frame with the new protein , i have not the expertise to say if this could be functional or not. @ryhisner could tou check please
I found a strange deletion in a couple sequences this morning, and when I looked for similar sequences, I was led to GW.5.1.1. I think it's possible that ORF7a, ORF7b, and ORF8 are all deleted in the S:F79S branch—something like ∆27395-28246. Furthermore, if this really is a deletion, as I suspect, it also creates an extra TRS for ORF9b/N. I consider it to be the 3rd TRS for ORF9b/N due to what I view as two overlapping TRS's already present. The sequence formed from the end of ORF6 to the start of N would be as pictured below. The BLAST and Nexclade alignments agree (though there are a number of sloppy sequences that seem to butcher everything).
Having a 3rd TRS for ORF9b/N would not be entirely unprecedented as Gamma had an AACA insertion in the ORF9b/N TRS that created three overlapping TRS motifs. .
.
I searched for all sequences with G25839T, G25906T, and C12473T and found 129. Almost all were total blanks in ORF7a, ORF7b, and ORF8 in Nextclade, and when I uploaded the NextClade alignment fasta to AliView, the NNN's very closely line up in most sequences, including sequences from numerous different countries, as seen below. (Almost all the NNN's left out of the picture because there are way too many to show.)
. The BLAST alignments are basically identical to the Nextclade ones. Below is an example. Query is a GW.5.1.1 sequence and Sbjct is the Wuhan reference genome.
. Maybe there's something entirely different going on here, but a deletion spanning approximately 27395-28246—where nearly every GW.5.1.1.1 has NNN's—is the simplest explanation I can think of. A consistent theme of SARS-CoV-2 evolution has been mutations that increase transcription of N/ORF9b, so the additional TRS for N/ORF9b fits that pattern. ORF8 has of course been almost a non-factor since the rise of BA.5, which had almost no ORF8 expression. The vast majority of XBB of course have ORF8:G8 while XBC.1 almost certainly have virtually no ORF8 expression due to a TRS-ablating mutation and ORF8:K2T, which can interfere with transcription.
Furthermore, large deletions, stop codons, and frameshifts leading to stop codons in both ORF7a and ORF7b have been relatively common in the Omicron era. My hypothesis is that ORF6 and ORF7a/ORF7b have redundant functions, so that as long as one is fully functioning, the other is disposable. Krogan Lab has done great work showing that ORF6:D61L severely reduces the ability of ORF6 to combat the interferon response, which it does by blocking the imports to and exports from the cell nucleus. https://www.biorxiv.org/content/10.1101/2022.10.18.512708v2
.
As long as ORF7a/b is fully functioning, ORF6:D61L seems tolerable. There are three mutations that more or less destroy the ORF7a TRS: C27389T, G27390T, and C27393T. These mutations have recurred throughout the pandemic, but they have been far less common in periods during which ORF6:D61L was predominant. Graphs below, which I've overlapped by making one semi-transparent, are from CovSpectrum. The scales of the y-axes are very different but the trends are apparent.
. It's more difficult to search for large deletions, but below was my attempt to compare the prevalence of large ORF7a deletions with ORF6:D61L. It doesn't catch frameshifting deletions smaller than 20 nt.
. For reasons that are totally unclear to me, ORF7a deletions and ORF7a-TRS-destroying mutations have been FAR more common in South Africa than elsewhere—something like 20 times more common. In the graph below, I combined the ORF7a-TRS destroyers with the large ORF7a deletions but for South Africa instead of globally. Note that unlike in the previous, global graph, the y-axes are aligned.
. Looking at the BLAST and Nextclade alignments, there seems to be no sign of ORF6:D61L. This would restore ORF6 to its previous level of potent innate immune evasion, which would, in turn, make both ORF7a and ORF7b disposable. So all in all, if this huge string of NNN's does turn out to be an enormous deletion, as I suspect, it makes sense to me. The renewal of ORF6 through eliminating ORF6:D61L make ORF7a redundant. ORF8 was already non-functional and just taking up space. And the additional TRS for ORF9b and N, both known potent innate immune antagonists, would both satisfy the virus's unquenchable thirst for more N/ORF9b transcription and possibly make ORF7a/ORF7b even more disposable to boot.
Wow Ryan! put together a paper on this please ! You did a wonderful work! and dont forget that the virus tried this already with success in XBC.1.3 where Orf7a Orf7b Orf8 are completely messed up see for reference this: https://github.com/sars-cov-2-variants/lineage-proposals/issues/542
Defining mutations: GW.5 then ORF3a:W149C(G25839T)+nuc:A28768G https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_1085f_cf73f0.json?c=gt-ORF3a_149&gmax=26220&gmin=25393&label=id:node_6813488