cov-lineages / pango-designation

Repository for suggesting new lineages that should be added to the current scheme
Other
1.04k stars 98 forks source link

FL.4+28615G with Orf1a:D1228G(531), Orf8:S67F seem dominant in this variant (472) as of 2023-06-17 #2018

Closed krosa1910 closed 1 year ago

krosa1910 commented 1 year ago

Sublineage of FL.4 Mutations Added to FL.4: A28615G (early)->C23635T(early)->A3948G(late)->C28093T(latest) The two early silent mutation describes about 20% of FL.4 sequences. This variant likely evolved from that. Gisaid Inquiry: G16741T,A28615G,C23635T,A3948G(,C28093T) Find 347 good (complete, excluding low coverage) sequences: EPI_ISL_17016324, EPI_ISL_17016364, EPI_ISL_17016366, EPI_ISL_17030050, EPI_ISL_17081296, EPI_ISL_17088969, EPI_ISL_17088990, EPI_ISL_17094218, EPI_ISL_17146457, EPI_ISL_17173564, EPI_ISL_17186722, EPI_ISL_17186742, EPI_ISL_17207297, EPI_ISL_17207303-17207304, EPI_ISL_17207308, EPI_ISL_17207313, EPI_ISL_17207318, EPI_ISL_17209907, EPI_ISL_17242240, EPI_ISL_17247563, EPI_ISL_17255992, EPI_ISL_17262186, EPI_ISL_17262273, EPI_ISL_17262303, EPI_ISL_17262313, EPI_ISL_17262342, EPI_ISL_17262344, EPI_ISL_17262449, EPI_ISL_17289377, EPI_ISL_17289380, EPI_ISL_17292391, EPI_ISL_17306711, EPI_ISL_17307091, EPI_ISL_17307136, EPI_ISL_17307140-17307141, EPI_ISL_17307307-17307308, EPI_ISL_17308546, EPI_ISL_17343467, EPI_ISL_17344701, EPI_ISL_17346203, EPI_ISL_17346871, EPI_ISL_17346874, EPI_ISL_17346877, EPI_ISL_17346879, EPI_ISL_17346902, EPI_ISL_17346911, EPI_ISL_17346914-17346915, EPI_ISL_17346919, EPI_ISL_17346924, EPI_ISL_17347192, EPI_ISL_17347194, EPI_ISL_17347200, EPI_ISL_17350597, EPI_ISL_17352684, EPI_ISL_17374257, EPI_ISL_17385597, EPI_ISL_17385599, EPI_ISL_17385606-17385607, EPI_ISL_17385609, EPI_ISL_17385656-17385657, EPI_ISL_17385717, EPI_ISL_17385724, EPI_ISL_17385810, EPI_ISL_17385812-17385813, EPI_ISL_17386184, EPI_ISL_17388425, EPI_ISL_17388437, EPI_ISL_17388471, EPI_ISL_17388476, EPI_ISL_17388479, EPI_ISL_17388490, EPI_ISL_17388529, EPI_ISL_17388803, EPI_ISL_17388867, EPI_ISL_17407538, EPI_ISL_17408922, EPI_ISL_17415848, EPI_ISL_17417026, EPI_ISL_17417449, EPI_ISL_17431937, EPI_ISL_17431983, EPI_ISL_17432089, EPI_ISL_17432100, EPI_ISL_17432184, EPI_ISL_17432231, EPI_ISL_17432306, EPI_ISL_17442728, EPI_ISL_17442814, EPI_ISL_17472546, EPI_ISL_17473718, EPI_ISL_17474425, EPI_ISL_17474476, EPI_ISL_17481624, EPI_ISL_17484980, EPI_ISL_17490888-17490889, EPI_ISL_17490895, EPI_ISL_17492820, EPI_ISL_17495577, EPI_ISL_17496032, EPI_ISL_17496035, EPI_ISL_17496048, EPI_ISL_17496052, EPI_ISL_17496061-17496062, EPI_ISL_17496070, EPI_ISL_17496075, EPI_ISL_17496078, EPI_ISL_17496084-17496085, EPI_ISL_17496089-17496090, EPI_ISL_17496323-17496324, EPI_ISL_17496394, EPI_ISL_17497596, EPI_ISL_17511592, EPI_ISL_17511925, EPI_ISL_17518688, EPI_ISL_17519358, EPI_ISL_17522957, EPI_ISL_17524639, EPI_ISL_17527363, EPI_ISL_17527431, EPI_ISL_17527920, EPI_ISL_17537044-17537045, EPI_ISL_17538424, EPI_ISL_17538443, EPI_ISL_17539037, EPI_ISL_17539228, EPI_ISL_17539322-17539324, EPI_ISL_17539344, EPI_ISL_17539513, EPI_ISL_17539575, EPI_ISL_17539600, EPI_ISL_17539603, EPI_ISL_17539638, EPI_ISL_17539695, EPI_ISL_17539799, EPI_ISL_17539851, EPI_ISL_17539854, EPI_ISL_17539889, EPI_ISL_17539919, EPI_ISL_17539950, EPI_ISL_17539956, EPI_ISL_17540277, EPI_ISL_17547294, EPI_ISL_17547330, EPI_ISL_17548179, EPI_ISL_17553764, EPI_ISL_17554682, EPI_ISL_17554746, EPI_ISL_17558058, EPI_ISL_17564178, EPI_ISL_17564774, EPI_ISL_17565000, EPI_ISL_17584874, EPI_ISL_17584879, EPI_ISL_17595995, EPI_ISL_17596091, EPI_ISL_17596116, EPI_ISL_17596122, EPI_ISL_17596153, EPI_ISL_17596167, EPI_ISL_17596169, EPI_ISL_17596209, EPI_ISL_17596325, EPI_ISL_17596360, EPI_ISL_17596408, EPI_ISL_17596415, EPI_ISL_17596425, EPI_ISL_17596720, EPI_ISL_17597809, EPI_ISL_17601274, EPI_ISL_17602916, EPI_ISL_17602929, EPI_ISL_17603109, EPI_ISL_17604159, EPI_ISL_17604253, EPI_ISL_17604371, EPI_ISL_17604514, EPI_ISL_17604522, EPI_ISL_17604878, EPI_ISL_17605013, EPI_ISL_17605040, EPI_ISL_17605179, EPI_ISL_17605231, EPI_ISL_17605247-17605250, EPI_ISL_17606015, EPI_ISL_17606038, EPI_ISL_17606094, EPI_ISL_17606721, EPI_ISL_17606732, EPI_ISL_17613079, EPI_ISL_17615370, EPI_ISL_17618336, EPI_ISL_17622054, EPI_ISL_17622223, EPI_ISL_17622235, EPI_ISL_17622258, EPI_ISL_17624057-17624058, EPI_ISL_17624213, EPI_ISL_17624215, EPI_ISL_17624249, EPI_ISL_17624268, EPI_ISL_17624271, EPI_ISL_17624276, EPI_ISL_17624286, EPI_ISL_17625121, EPI_ISL_17625405, EPI_ISL_17625579, EPI_ISL_17625608, EPI_ISL_17628804, EPI_ISL_17631329, EPI_ISL_17631337, EPI_ISL_17631370, EPI_ISL_17631376, EPI_ISL_17631462, EPI_ISL_17631504, EPI_ISL_17631517, EPI_ISL_17631526, EPI_ISL_17631554, EPI_ISL_17631569, EPI_ISL_17631617, EPI_ISL_17633304, EPI_ISL_17637180-17637181, EPI_ISL_17637183, EPI_ISL_17639497, EPI_ISL_17645948, EPI_ISL_17646870, EPI_ISL_17646932, EPI_ISL_17647118, EPI_ISL_17647148, EPI_ISL_17647195, EPI_ISL_17647217, EPI_ISL_17647295, EPI_ISL_17647496, EPI_ISL_17647545, EPI_ISL_17647585, EPI_ISL_17647736, EPI_ISL_17647748, EPI_ISL_17647780, EPI_ISL_17648465, EPI_ISL_17648515, EPI_ISL_17648530, EPI_ISL_17648596, EPI_ISL_17650123, EPI_ISL_17657567, EPI_ISL_17657599, EPI_ISL_17661635, EPI_ISL_17661777, EPI_ISL_17662924, EPI_ISL_17663068, EPI_ISL_17667455, EPI_ISL_17667984, EPI_ISL_17667994, EPI_ISL_17668378, EPI_ISL_17668725, EPI_ISL_17669274, EPI_ISL_17669299, EPI_ISL_17670255, EPI_ISL_17670431, EPI_ISL_17672439, EPI_ISL_17672471, EPI_ISL_17672475, EPI_ISL_17673750, EPI_ISL_17677489, EPI_ISL_17677543, EPI_ISL_17677546, EPI_ISL_17677622, EPI_ISL_17677631, EPI_ISL_17677785, EPI_ISL_17677836, EPI_ISL_17677958, EPI_ISL_17678831-17678832, EPI_ISL_17681532, EPI_ISL_17682100, EPI_ISL_17682768, EPI_ISL_17682816, EPI_ISL_17682819, EPI_ISL_17690528, EPI_ISL_17690542-17690543, EPI_ISL_17690601, EPI_ISL_17690608, EPI_ISL_17690620, EPI_ISL_17690627-17690629, EPI_ISL_17690691, EPI_ISL_17690707, EPI_ISL_17690749, EPI_ISL_17690773, EPI_ISL_17690833, EPI_ISL_17690993, EPI_ISL_17691017, EPI_ISL_17691032, EPI_ISL_17691055, EPI_ISL_17691100, EPI_ISL_17691144-17691145, EPI_ISL_17691159, EPI_ISL_17691313, EPI_ISL_17691504-17691505, EPI_ISL_17691987, EPI_ISL_17692081, EPI_ISL_17692226, EPI_ISL_17692237, EPI_ISL_17694772, EPI_ISL_17694861, EPI_ISL_17694869, EPI_ISL_17694897, EPI_ISL_17695029, EPI_ISL_17696457, EPI_ISL_17696781, EPI_ISL_17696810, EPI_ISL_17697789, EPI_ISL_17697800, EPI_ISL_17697952, EPI_ISL_17703798, EPI_ISL_17703818, EPI_ISL_17704288, EPI_ISL_17704617, EPI_ISL_17704628, EPI_ISL_17706596, EPI_ISL_17706727, EPI_ISL_17709567, EPI_ISL_17709771, EPI_ISL_17711597, EPI_ISL_17712559,

Earliest Sequence: EPI_ISL_17081296 2023-02-07 Latest Sequence: EPI_ISL_17695029 2023-05-16

Usher:https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_1ee2a_d8d70.json?f_userOrOld=uploaded%20sample&gt=nuc.28093T&label=id:node_4781826

Usher says only 18 seqs does not have S:S67F, different from gisaid, I am not sure which one to trust, but neither 36 nor 18 makes a majority compared to 347 total seqs.

The extra mutation is very interesting, it is major in Mu(B.1.621 if everyone forgets) and some contemporary larger VOCs (Delta and Alpha), and during the BA.2.75 wave it appeared mostly within BR.2.1, but also gets mentions in CH.1.1 sub-clades. Now it appear at a great quantity within the FL.4 class, which is a quite large sub-lineage of the globally successful XBB.1.9.1 family. Yet this seems preposterous because XBB.1 usually discard most of their ORF8 protein by the ORF8:G8 mutation, and this mutation actually stays in all 311 sequences I listed here. Simply because it was a very important mutation and it has a sheer number (few issues would have more than three hundreds sequences when submitted) I want to monitor this within the main repository. If this thing turned out to seem more than a silent mutation, then I consider it to be a challenger to the theory that ORF8 protein doesn't matter any more in XBB. @ryhisner I would like to ask you for more explanation about ORF8 protein evolution as you are obviously experienced with that.

ryhisner commented 1 year ago

ORF8:S67F is definitely homoplasic, and it has recently appeared in some fast-growing lineages in the past six months. However, I'm skeptical that it confers any benefit. It only requires a C->T mutation, the most common kind, and more importantly, it has a very favorable nucleotide context for APOBEC deamination, which is likely the primary cause for C->T mutations. In general, A and T are favored in the -2 to +2 positions.

There is some disagreement between studies on whether or not G is favorable at the +1 position, but there's general agreement that A and T are favorable overall. Below is a passage from "Potential APOBEC-mediated RNA editing of the genomes of SARS-CoV-2 and other coronaviruses and its impact on their longer term evolution" by Jeremy Ratcliff and Peter Simmonds in the journal Virology. https://www.sciencedirect.com/science/article/pii/S0042682220302658

image

This conclusion concurs with my observations of the nucleotide contexts of the most common C->T mutations. For example, if we look at highly homoplasic mutations caused by C->T mutations that appear not to confer any growth advantage (and are likely slightly deleterious in my view), we should get a decent idea of what the most favorable nucleotide contexts are for APOBEC deamination. The top two such candidates, in my mind, are S:L5F and S:S255F. They reappear again and again and again and again, yet have never caught on in any major lineage. I've included those and four other similarly homoplasic but not advantageous mutation sites in the picture below, and it's clear that T and A predominate in the -2 to +2 positions.

image

This is also what we see surrounding nucleotide C28093, which undergoes a C->T mutation to form ORF8:S67F—all T's and A's, upstream and downstream. (The images directly above and below are from @theosanderson's incredibly useful Gensplore page, specifically the SARS-CoV-2 reference genome viewer. https://gensplore.theo.io/?gb=%2Fsequence.gb )

image

The other major factor in determining the frequency of APOBEC-caused C->T mutations is the position of the nucleotide in the secondary RNA structure of the genome. Studies have found that unpaired nucleotides at the end of stem-loops are much more likely to undergo APOBEC-driven deamination. Below is a passage from the Ratcliff/Simmonds APOBEC paper on this topic:

image

It's hard to find good maps of the secondary RNA structure for SARS-CoV-2. The best I have been able to locate is from "The architecture of the SARS-CoV-2 RNA genome inside virion," by Changchang Cao, Yuanchao Xue, and colleagues, which the diagram below is taken from. As best I understand, secondary RNA structure inside the cell is somewhat fluid and not necessarily exactly the same as inside the virion, but I think they are likely similar in most respects. https://www.nature.com/articles/s41467-021-22785-x

As you can see, C28093 is an unpaired nucleotide at the end of a very long stem-loop, the exact structural context favored by APOBEC.

image

One final note: I don't know whether deletions or non-APOBEC mutations at the end of stem-loops are more common than in other structural positions, but I suspect this might be the case. Mutations seem fairly common not just at ORF8:S67 but also at nearby nucleotides. A particularly common mutation in the same area is an out-of-frame deletion from 28090-28095. If anyone knows whether there's any literature on whether deletions & other non-APOBEC mutations are more common at the end of stem-loops, please share as I'd love to learn more about it.

image
FedeGueli commented 1 year ago

Thx @ryhisner great explanation.

krosa1910 commented 1 year ago

My mistake here. I thought that A3948G was silent, which turned out to be not true. It is Orf1a: D1228G or NSP_D410G. and that could have a mild advantage. Defining this proposal using simply A3948G gives more seqs, and maybe Orf8:S67F is simply carried into this thing early, and by chance it eliminated the none-Orf8:S67F descendents. I would change the proposal accordingly under this new denotion.

32250685-5260-4C23-B69D-018701D6A149 D1D1D1AC-20D4-431F-ABFD-039555C0B094
krosa1910 commented 1 year ago

About the distribution of this lineage: it is mostly in Asia and Australia, which is generally true regarding most variants that evolved in Southeast Asia. Speaking of numbers it is prominent Singapore and South Korea. It is not at a low but nonzero level in Europe and North America, with numbers that seem comparable with main branch of FL.4.

krosa1910 commented 1 year ago

This variant is really a lot and growing fast. Now it has 407 good seqs. Plus 363 have Orf8:67F

krosa1910 commented 1 year ago
ADA8BD57-3464-47AE-A427-DDA4DFC36289

Not holding well against other variants, but still quite major among FL.4* population.

Over-There-Is commented 1 year ago

@corneliusroemer designated as FL.4.5