cov-lineages / pango-designation

Repository for suggesting new lineages that should be added to the current scheme
Other
1.04k stars 97 forks source link

JN.1.39 + T111C [5' UTR] (469 seq, Apr 6) #2554

Closed ryhisner closed 1 month ago

ryhisner commented 5 months ago

Description Sub-lineage of: JN.1.39 (JN.1 + G2782T) Earliest sequence: 2023-11-27 – USA, Ohio — EPI_ISL_18609351; England — EPI_ISL_18598897 Most recent sequence: 2024-3-25 – China, Fujian — EPI_ISL_19025051 Continents circulating: North America (242), Asia (108), Europe (85), Oceania (16), Africa (15), South America (3) Top Countries circulating: North America (2 countries)—USA (226), Canada (16) Asia (12 countries)—Indonesia (30), China (15), Oman (14), South Korea (13), Singapore (12), Japan (11) Europe (14 countries)—UK (33), Sweden (12), France (10) Africa (2 countries)—Nigeria (13), South Africa (2) South America (1 country)—Brazil (3) Oceania (2 countries)—Australia (14), New Zealand (2) Number of Sequences: 469 GISAID Nucleotide Query: T111C, G2782T, -A12T, -C5512T, -C21762T CovSpectrum Query: Nextcladepangolineage:JN.1* & [2-of: T111C, G2782T] & [exactly-0-of: C5512T, C21762T] Substitutions on top of X: 5' UTR: T111C Nucleotide: T111C

USHER Tree https://nextstrain.org/fetch/raw.githubusercontent.com/ryhisner/jsons2/main/JN.1.39_T111C.json?c=gt-nuc_111&gmax=1111&label=id:node_6957831

image

Evidence I find it intriguing that there are two large JN.1 lineages with the synonymous G2782T and the 5' UTR mutation T111C. The other lineage with both T111C and G2782T is JN.1.33, which also has the synonymous C5221T and S:A67V (C21762T). Both lineages appear to have a modest but consistent growth advantage (~10-15% weekly) over baseline JN.1 despite not having S:R346T, S:F456S, or S:T572I, the three major mutations that clearly confer growth advantages at the moment.

I suppose there are three possibilities here:

1) The Usher tree is confused and these two lineages are actually related. T 2) The co-occurrence of these two mutations is coincidental and the modest growth advantages (10-15%) are not real but can be put down to unrepresentative sampling and founder effects. 3) There's some sort of connection between these two nucleotide mutations, maybe in secondary RNA structure, that somehow confers a slight benefit for JN.1.

There has been no connection between these two mutations before. In fact, if you search GISAID for T111C and G2782T, the only sequences returned are from JN.1 and three Bat-CoV sequences collected in Yunnan, China, in 2020 (RmYN05, RmYN08, RsYN04).

T111C is on SL4 in the 5' UTR. It is paired with G101, which is a weak, non-Watson-Crick base-pair bond. T111C would create a much stronger C-G base pair, which could conceivably affect the stability of SL4 and perhaps have some unknown effect on viral fitness.

image

5' UTR Image above is from: https://pubmed.ncbi.nlm.nih.gov/33636127/ Sun L, Li P, Ju X, et al. In vivo structural characterization of the SARS-CoV-2 RNA genome identifies host proteins vulnerable to repurposed drugs. Cell. 2021;184(7):1865-1883.e20. doi:10.1016/j.cell.2021.02.008

I aligned the nucleotide sequences for the 19 Bat-CoV sequences I could find on GISAID, and six of them had T111C (RsYN03, RsYN04, RsYN05, RsYN07, RsYN08, RsYN09). SARS-CoV-1 and murine hepatitis virus (MHV) also have T111C.

Genomes

Genomes EPI_ISL_18598897, EPI_ISL_18609351, EPI_ISL_18640194, EPI_ISL_18641344, EPI_ISL_18668153, EPI_ISL_18673408, EPI_ISL_18673415, EPI_ISL_18673443, EPI_ISL_18673459, EPI_ISL_18673461, EPI_ISL_18673464, EPI_ISL_18673469, EPI_ISL_18673506, EPI_ISL_18673545, EPI_ISL_18673559, EPI_ISL_18685150, EPI_ISL_18690159, EPI_ISL_18690695, EPI_ISL_18691057, EPI_ISL_18697903, EPI_ISL_18699715, EPI_ISL_18700764, EPI_ISL_18703530, EPI_ISL_18705205, EPI_ISL_18705264, EPI_ISL_18711419, EPI_ISL_18717639, EPI_ISL_18721865, EPI_ISL_18721872, EPI_ISL_18725663, EPI_ISL_18730171, EPI_ISL_18732843-18732844, EPI_ISL_18733930, EPI_ISL_18754905, EPI_ISL_18763459, EPI_ISL_18763857, EPI_ISL_18763868, EPI_ISL_18765704, EPI_ISL_18770476, EPI_ISL_18770484, EPI_ISL_18770839, EPI_ISL_18775021, EPI_ISL_18775724, EPI_ISL_18778094, EPI_ISL_18779414, EPI_ISL_18779431, EPI_ISL_18779505, EPI_ISL_18779671, EPI_ISL_18779692, EPI_ISL_18781154, EPI_ISL_18781196, EPI_ISL_18782741, EPI_ISL_18782813, EPI_ISL_18782820, EPI_ISL_18785068, EPI_ISL_18785672, EPI_ISL_18785695, EPI_ISL_18792753, EPI_ISL_18794289, EPI_ISL_18796593, EPI_ISL_18796695, EPI_ISL_18801605, EPI_ISL_18806595, EPI_ISL_18806612, EPI_ISL_18806732, EPI_ISL_18808433, EPI_ISL_18809074, EPI_ISL_18809519, EPI_ISL_18810849, EPI_ISL_18810964, EPI_ISL_18814982, EPI_ISL_18815006, EPI_ISL_18815384, EPI_ISL_18815420, EPI_ISL_18815423-18815424, EPI_ISL_18816773, EPI_ISL_18818182, EPI_ISL_18818750, EPI_ISL_18821806, EPI_ISL_18823781, EPI_ISL_18824080, EPI_ISL_18825509, EPI_ISL_18825536, EPI_ISL_18826697, EPI_ISL_18826828, EPI_ISL_18827100, EPI_ISL_18827694, EPI_ISL_18827710, EPI_ISL_18827759, EPI_ISL_18827918, EPI_ISL_18828248, EPI_ISL_18831501, EPI_ISL_18831505, EPI_ISL_18831507-18831508, EPI_ISL_18831698, EPI_ISL_18831766, EPI_ISL_18832582, EPI_ISL_18835463, EPI_ISL_18835660, EPI_ISL_18835674, EPI_ISL_18836159, EPI_ISL_18838674, EPI_ISL_18838715, EPI_ISL_18839765, EPI_ISL_18842099, EPI_ISL_18846849, EPI_ISL_18852111, EPI_ISL_18852272, EPI_ISL_18853330, EPI_ISL_18853599, EPI_ISL_18856226, EPI_ISL_18858394, EPI_ISL_18858483, EPI_ISL_18859187, EPI_ISL_18859474, EPI_ISL_18859609, EPI_ISL_18860088, EPI_ISL_18860107, EPI_ISL_18860200, EPI_ISL_18861910, EPI_ISL_18863248, EPI_ISL_18863546, EPI_ISL_18864176, EPI_ISL_18864206, EPI_ISL_18868123, EPI_ISL_18869153, EPI_ISL_18870229, EPI_ISL_18870396, EPI_ISL_18871160, EPI_ISL_18872220, EPI_ISL_18872572, EPI_ISL_18872858, EPI_ISL_18877514, EPI_ISL_18878306, EPI_ISL_18878508, EPI_ISL_18879832, EPI_ISL_18879849, EPI_ISL_18879851-18879852, EPI_ISL_18879861, EPI_ISL_18879871, EPI_ISL_18879888, EPI_ISL_18879912, EPI_ISL_18880359, EPI_ISL_18880488, EPI_ISL_18880626, EPI_ISL_18881887, EPI_ISL_18882893-18882894, EPI_ISL_18884244, EPI_ISL_18885315, EPI_ISL_18885478, EPI_ISL_18886334, EPI_ISL_18886354, EPI_ISL_18886389, EPI_ISL_18886408, EPI_ISL_18886536, EPI_ISL_18887542, EPI_ISL_18892276, EPI_ISL_18892401, EPI_ISL_18893021, EPI_ISL_18895583, EPI_ISL_18895774, EPI_ISL_18900748, EPI_ISL_18901832, EPI_ISL_18902164, EPI_ISL_18902186, EPI_ISL_18903232, EPI_ISL_18903636, EPI_ISL_18903683, EPI_ISL_18907506, EPI_ISL_18907509, EPI_ISL_18907540, EPI_ISL_18907576, EPI_ISL_18907579, EPI_ISL_18907598, EPI_ISL_18907601, EPI_ISL_18907603, EPI_ISL_18907648, EPI_ISL_18907691, EPI_ISL_18907697, EPI_ISL_18907703, EPI_ISL_18907865, EPI_ISL_18907873, EPI_ISL_18907905, EPI_ISL_18907907, EPI_ISL_18907938, EPI_ISL_18907962, EPI_ISL_18907976, EPI_ISL_18908569, EPI_ISL_18909386, EPI_ISL_18909536, EPI_ISL_18910549, EPI_ISL_18912663, EPI_ISL_18912735, EPI_ISL_18913053, EPI_ISL_18913545, EPI_ISL_18914646, EPI_ISL_18915614, EPI_ISL_18915880, EPI_ISL_18916642-18916644, EPI_ISL_18916647, EPI_ISL_18916667, EPI_ISL_18916680-18916681, EPI_ISL_18917371, EPI_ISL_18918222, EPI_ISL_18918366, EPI_ISL_18918393, EPI_ISL_18919387, EPI_ISL_18919506, EPI_ISL_18920092-18920093, EPI_ISL_18920293, EPI_ISL_18921100, EPI_ISL_18921168, EPI_ISL_18921559, EPI_ISL_18921838, EPI_ISL_18921913, EPI_ISL_18921948, EPI_ISL_18922394, EPI_ISL_18923350, EPI_ISL_18923635, EPI_ISL_18924093, EPI_ISL_18927287, EPI_ISL_18927477, EPI_ISL_18927559, EPI_ISL_18927730, EPI_ISL_18928567, EPI_ISL_18928774, EPI_ISL_18928949, EPI_ISL_18930545, EPI_ISL_18930557-18930560, EPI_ISL_18930583, EPI_ISL_18930672, EPI_ISL_18931373, EPI_ISL_18931446, EPI_ISL_18931727, EPI_ISL_18931729, EPI_ISL_18931735, EPI_ISL_18931746, EPI_ISL_18931758, EPI_ISL_18932343, EPI_ISL_18932528, EPI_ISL_18932657, EPI_ISL_18935614, EPI_ISL_18936054, EPI_ISL_18937006, EPI_ISL_18937168, EPI_ISL_18937400, EPI_ISL_18937586, EPI_ISL_18939942, EPI_ISL_18940294, EPI_ISL_18940629, EPI_ISL_18942615, EPI_ISL_18942668, EPI_ISL_18942671, EPI_ISL_18942740, EPI_ISL_18944081, EPI_ISL_18944109, EPI_ISL_18944149, EPI_ISL_18946273-18946274, EPI_ISL_18946314, EPI_ISL_18946318, EPI_ISL_18946369, EPI_ISL_18946481, EPI_ISL_18946514, EPI_ISL_18948387, EPI_ISL_18948504, EPI_ISL_18949261, EPI_ISL_18949274, EPI_ISL_18950316, EPI_ISL_18950946, EPI_ISL_18952122, EPI_ISL_18954443, EPI_ISL_18954544, EPI_ISL_18954611, EPI_ISL_18954706, EPI_ISL_18955768, EPI_ISL_18956004, EPI_ISL_18956352, EPI_ISL_18956462, EPI_ISL_18956489, EPI_ISL_18956700, EPI_ISL_18956772, EPI_ISL_18957080, EPI_ISL_18957148, EPI_ISL_18957455, EPI_ISL_18958990-18958992, EPI_ISL_18959055, EPI_ISL_18959126, EPI_ISL_18959174, EPI_ISL_18959998, EPI_ISL_18960035, EPI_ISL_18960810, EPI_ISL_18960942, EPI_ISL_18961010, EPI_ISL_18961057, EPI_ISL_18961185, EPI_ISL_18961263, EPI_ISL_18963777, EPI_ISL_18963787, EPI_ISL_18964019, EPI_ISL_18964231, EPI_ISL_18964487, EPI_ISL_18964750, EPI_ISL_18964760, EPI_ISL_18964769, EPI_ISL_18964846, EPI_ISL_18965284, EPI_ISL_18965945, EPI_ISL_18966519, EPI_ISL_18966932, EPI_ISL_18966938, EPI_ISL_18967258, EPI_ISL_18967290, EPI_ISL_18967321, EPI_ISL_18967398, EPI_ISL_18967679, EPI_ISL_18967681, EPI_ISL_18967728, EPI_ISL_18967731, EPI_ISL_18967787, EPI_ISL_18967793, EPI_ISL_18967852, EPI_ISL_18968327, EPI_ISL_18968571, EPI_ISL_18968604, EPI_ISL_18968876, EPI_ISL_18968958, EPI_ISL_18970561, EPI_ISL_18970785, EPI_ISL_18971333, EPI_ISL_18972168, EPI_ISL_18972438, EPI_ISL_18972693, EPI_ISL_18972698, EPI_ISL_18972701-18972703, EPI_ISL_18972716, EPI_ISL_18973651, EPI_ISL_18973777, EPI_ISL_18974049, EPI_ISL_18974222, EPI_ISL_18974271, EPI_ISL_18974653, EPI_ISL_18974655, EPI_ISL_18974659, EPI_ISL_18975322, EPI_ISL_18976602, EPI_ISL_18976604, EPI_ISL_18977970, EPI_ISL_18978012, EPI_ISL_18979360, EPI_ISL_18979459, EPI_ISL_18979639, EPI_ISL_18981821, EPI_ISL_18981983, EPI_ISL_18981985, EPI_ISL_18982345, EPI_ISL_18982454, EPI_ISL_18982516, EPI_ISL_18983131, EPI_ISL_18983555, EPI_ISL_18985129, EPI_ISL_18985147, EPI_ISL_18985166, EPI_ISL_18985279, EPI_ISL_18985286, EPI_ISL_18985293, EPI_ISL_18985307, EPI_ISL_18985313, EPI_ISL_18985335, EPI_ISL_18985394, EPI_ISL_18985442-18985443, EPI_ISL_18986082, EPI_ISL_18986092, EPI_ISL_18986584, EPI_ISL_18987172, EPI_ISL_18988431, EPI_ISL_18988433, EPI_ISL_18989584, EPI_ISL_18990059, EPI_ISL_18992465, EPI_ISL_18993922, EPI_ISL_18994082, EPI_ISL_18994511, EPI_ISL_18995378, EPI_ISL_18998035, EPI_ISL_18998070, EPI_ISL_18998873, EPI_ISL_18998896, EPI_ISL_18999139, EPI_ISL_18999993, EPI_ISL_19000198, EPI_ISL_19000432, EPI_ISL_19001281, EPI_ISL_19001583, EPI_ISL_19002546, EPI_ISL_19002641, EPI_ISL_19003690, EPI_ISL_19003704, EPI_ISL_19004887, EPI_ISL_19004889, EPI_ISL_19004912, EPI_ISL_19004926, EPI_ISL_19006232, EPI_ISL_19006721, EPI_ISL_19006744, EPI_ISL_19006812, EPI_ISL_19008045, EPI_ISL_19008247, EPI_ISL_19009679, EPI_ISL_19009965, EPI_ISL_19012307, EPI_ISL_19012429, EPI_ISL_19015361, EPI_ISL_19015396, EPI_ISL_19016001, EPI_ISL_19016651, EPI_ISL_19016667, EPI_ISL_19016791, EPI_ISL_19017357, EPI_ISL_19018044, EPI_ISL_19018183, EPI_ISL_19018267, EPI_ISL_19019132, EPI_ISL_19019180, EPI_ISL_19019350, EPI_ISL_19019549, EPI_ISL_19021182, EPI_ISL_19021185, EPI_ISL_19021187-19021188, EPI_ISL_19021207, EPI_ISL_19021212, EPI_ISL_19021969, EPI_ISL_19021984, EPI_ISL_19022508, EPI_ISL_19022534, EPI_ISL_19022873, EPI_ISL_19023113, EPI_ISL_19024326, EPI_ISL_19024331, EPI_ISL_19025051, EPI_ISL_19025164, EPI_ISL_19025953, EPI_ISL_19027995, EPI_ISL_19028278, EPI_ISL_19028540-19028541, EPI_ISL_19030177, EPI_ISL_19030430, EPI_ISL_19030566, EPI_ISL_19032769, EPI_ISL_19032771
aviczhl2 commented 5 months ago

The usher tree is likely confused. Some JN.1.33 seqs have position 111 missing coverage, so usher categorize them as not having T111C, therefore it places T111C after S:A67V.

However no JN.1.33 seq has 111T, suggesting that those 111 missing coverage seqs do have T111C and JN.1.33 is actually a sub-branch of JN.1.39, or a recombinant involving JN.1.39 as its 5' parent.

ryhisner commented 5 months ago

But T111C doesn't have anything to do with why Usher puts these in separate trees. It separates them because all JN.1.33 have C5512T and no JN.1.39 + T111C have C5512T, and it appears that C5512T comes before G2782T because there are many sequences that have C5512T but not G2782T.

image

A CovSpectrum search for

Nextcladepangolineage:JN.1* & [3-of: T2781T, G2782G, A2783A] & [3-of: A5511A, C5512T, G5513G]

returns 350 sequences from 33 different countries, for example.

aviczhl2 commented 5 months ago

But T111C doesn't have anything to do with why Usher puts these in separate trees. It separates them because all JN.1.33 have C5512T and no JN.1.39 + T111C have C5512T, and it appears that C5512T comes before G2782T because there are many sequences that have C5512T but not G2782T. <img alt="image" width="1499" src="https://private-user-

There is one sequence with G2782T, T111C, C5512T but S:67A, EPI_ISL_18982930 from China.

The sequence suggest the correct order shall be JN.1.39->T111C->C5512T->S:A67V, C5512T is gotten either through convergent evolution, or via recombination with the C5512T branch of JN.1, S:A67V is then gotten.

FedeGueli commented 5 months ago

cc @corneliusroemer @AngieHinrichs could you look at this? it is from some weeks @aviczhl2 is raising the issue of the T111C , G2782T being splitted in JN.1.33 and JN.1.39 , to me is hard to reach a consensus on this.

ryhisner commented 5 months ago

There is one sequence with G2782T, T111C, C5512T but S:67A, EPI_ISL_18982930 from China.

The sequence suggest the correct order shall be JN.1.39->T111C->C5512T->S:A67V, C5512T is gotten either through convergent evolution, or via recombination with the C5512T branch of JN.1, S:A67V is then gotten.

Mutations next to deletions are frequently misread, so I'd be surprised if there aren't numerous sequences with G2782T, T111C, C5512T, and S:67A. But the fact that there are 350 sequences with C5512T but without G2782T, T111C, or S:A67V makes it clear these two lineages are very unlikely to be directly related unless it's through recombination.

aviczhl2 commented 5 months ago

Mutations next to deletions are frequently misread, so I'd be surprised if there aren't numerous sequences with G2782T, T111C, C5512T, and S:67A. But the fact that there are 350 sequences with C5512T but without G2782T, T111C, or S:A67V makes it clear these two lineages are very unlikely to be directly related unless it's through recombination.

Yeah I also think it is likely a recombinant. China is not submitting much sequences so lineages with S:67A is likely to have very few seqs due to it cannot compare with S:67V. I just wanna point out the correct order of the mutation shall be JN.1->G2782T->T111C->C5512T->S:A67V as there are no seq of 5512T+[exactly-1-of: 2782T, 111C, S:67V]

ryhisner commented 5 months ago

Yeah, but there are hundreds of sequences with C5512T and not G2782T. C5512T had to come first on the JN.1.33 branch. There are zero sequences on the JN.1.39 branch that have C5512T.

aviczhl2 commented 5 months ago

There are zero sequences on the JN.1.39 branch that have C5512T.

That's because they are placed to JN.1.33.

ryhisner commented 5 months ago

They don't have S:A67V either though. And even if that's the case, it still doesn't explain how there are hundreds of sequences with C5512T but without T111C, G2782T, or S:A67V.

aviczhl2 commented 5 months ago

They don't have S:A67V either though. And even if that's the case, it still doesn't explain how there are hundreds of sequences with C5512T but neither T111C nor A:S67V.

1:There is a JN.1+C5512T branch (which is where "hundreds of seqs" come from) 2:There is a JN.1+G2782T+T111C branch

3: There is a JN.1+T111C,G2782T,C5512T branch, and usher place them as JN.1+C5512T+G2782T,T111C.

3 is likely a recomb of 1 and 2, and if we don't consider recombs 3 shall more likely to be placed under 2.

4: S:A67V is a sub-branch of 3.

aviczhl2 commented 5 months ago

image

There is a large S:R346T branch under this now, also with an S:F456L sub-branch.

FedeGueli commented 5 months ago

image

There is a large S:R346T branch under this now, also with an S:F456L sub-branch.

it is branch 51 of https://github.com/sars-cov-2-variants/lineage-proposals/issues/1089

FedeGueli commented 5 months ago

the 456l branch with 1104L could be a recombinant ?

Over-There-Is commented 4 months ago

JN.1.33 has been redesignated as JN.1.39.3 now.