cov-lineages / pango-designation

Repository for suggesting new lineages that should be added to the current scheme
Other
1.03k stars 97 forks source link

JN.1.18.4 (JN.1.18+FLiRT+S:L492L+Orf6:I60T) with S:F1103S (20 on Gisaid) #2567

Open FedeGueli opened 2 months ago

FedeGueli commented 2 months ago

Original proposal Formerly tracked as Branch 87 of https://github.com/sars-cov-2-variants/lineage-proposals/issues/1089

JN.1.18 (S:R346T)> S:F456L (T22930G), T23036C (Silent:S:L492L) , Orf6:I60T (T27380C) thx @aviczhl2 for noting the right placement . Query: T27380C,T22930G Samples: 5 ( 6 on Usher) with one sample from Ethiopia and one from Ghana EPI_ISL_19029565, EPI_ISL_19062760, EPI_ISL_19070234, EPI_ISL_19075644, EPI_ISL_19075660 I want to highlight that silent mutation S:L492L (T23036C) could open up a way to 492P/Q/R if 23037 will mutate further

Tree:

Screenshot 2024-05-04 alle 19 24 59

There are two branches: One with S:K182I [defining #2567 >>Orf1a:G604S (G2075A), S:K182I (A22107T) ] query: G2075A, A22107T, T27380C
Samples: 3

One with S:I197V (A22151G), S:P251H

2567 > ORF1a:E1766A (A5562C),C9565T, T16950C S:I197V (A22151G), S:P251H (C22314A), C27247T

Query: A5562C, A22151G ,C9565T, Samples: 7

One with S:F1103S

2567 > S:F1103S (T24870C)

Query: T24870C,T27380C Samples: 13 (NY), (Ca) 27 on USher

One with S:A222V

2567 > S:A222V (C22227T)

Query : C22227T, T27380C,T22930G Samples: 5 (NZL)

FedeGueli commented 2 months ago

8 now , the last two GBW (likely same patient) samples have two additional spike mutations: S:I197V and S:P251H

FedeGueli commented 1 month ago

10 growing

FedeGueli commented 1 month ago

@AngieHinrichs now this has been misplaced under the Branch 20 of https://github.com/sars-cov-2-variants/lineage-proposals/issues/1253 but i think it is very hard to think it belongs really there cause this has T22930G while Branch 20 is the collector of everything with T22930A ( in the same way Jn.1.16 did for T22928C)

cc @corneliusroemer @aviczhl2

aviczhl2 commented 1 month ago

The 27143 branch is errorous and usher has fixed it. This one shall be sub-branch of JN.1.18

FedeGueli commented 1 month ago

The 27143 branch is errorous and usher has fixed it. This one shall be sub-branch of JN.1.18

Thx!

AngieHinrichs commented 1 month ago

@AngieHinrichs now this has been misplaced under the Branch 20 of https://github.com/sars-cov-2-variants/lineage-proposals/issues/1253 but i think it is very hard to think it belongs really there cause this has T22930G while Branch 20 is the collector of everything with T22930A ( in the same way Jn.1.16 did for T22928C)

Yes, UShER and/or matOptimize is placing it wrong there. Thanks for the reminder, I need to make a simple test case for a bug report.

FedeGueli commented 1 month ago

@AngieHinrichs now this has been misplaced under the Branch 20 of sars-cov-2-variants/lineage-proposals#1253 but i think it is very hard to think it belongs really there cause this has T22930G while Branch 20 is the collector of everything with T22930A ( in the same way Jn.1.16 did for T22928C)

Yes, UShER and/or matOptimize is placing it wrong there. Thanks for the reminder, I need to make a simple test case for a bug report.

Thank you!

AngieHinrichs commented 1 month ago

When I look at this more closely, changing the tree would not reduce the parsimony penalty. The current structure has 3 mutations on one branch and 1 on another: T22930A > G22599C > A22930G G22599C To our eyes, this structure looks more likely: T22930A > G22599C G22599C > T22930G -- but it is still 4 mutations in total, so from a parsimony point of view it is equivalent. And using parsimony is what makes usher and matOptimize fast enough to keep up with the large size of the tree. So strictly speaking it is hard to cast this as a bug even though it is annoying, and it would be nicer if usher and matOptimize could also look at the effect on path lengths or slightly penalize paths that have successive mutations at the same position.

The good news is that if I force-move the node to where we want it to go, matOptimize should leave it there because it can't improve parsimony by moving it where we don't want it to go. So I will give that a try.

aviczhl2 commented 1 month ago

When I look at this more closely, changing the tree would not reduce the parsimony penalty. The current structure has 3 mutations on one branch and 1 on another: T22930A > G22599C > A22930G G22599C To our eyes, this structure looks more likely: T22930A > G22599C G22599C > T22930G -- but it is still 4 mutations in total, so from a parsimony point of view it is equivalent. And using parsimony is what makes usher and matOptimize fast enough to keep up with the large size of the tree. So strictly speaking it is hard to cast this as a bug even though it is annoying, and it would be nicer if usher and matOptimize could also look at the effect on path lengths or slightly penalize paths that have successive mutations at the same position.

The good news is that if I force-move the node to where we want it to go, matOptimize should leave it there because it can't improve parsimony by moving it where we don't want it to go. So I will give that a try.

I think some penalty shall be added for mutating the same position multiple times in short gaps. We(and the system) shall assume things like 22930T->22930A->22930G in a short time is unlikely to occur.

FedeGueli commented 1 month ago
FedeGueli commented 1 month ago

24

AngieHinrichs commented 1 month ago

I did a force move of this branch (NewZealand/24ZA2051/2024) to JN.1.18 a couple days ago and so far matOptimize has left it in place.

FedeGueli commented 1 month ago

I did a force move of this branch (NewZealand/24ZA2051/2024) to JN.1.18 a couple days ago and so far matOptimize has left it in place.

Thank you Angie!

FedeGueli commented 1 month ago

Designated JN.1.18.4 ,via https://github.com/cov-lineages/pango-designation/commit/cdf43cc17b50ef6767318b9d12db9f6fe3256188 LH.1 (K182I) via https://github.com/cov-lineages/pango-designation/commit/95b07a704d7715d1d1b01fb443e2120c63ef0003 LH.2 (197V , 251H) via https://github.com/cov-lineages/pango-designation/commit/9bc4a499222ba868b0a28bc2937001995b78c253

FedeGueli commented 3 weeks ago

ping @corneliusroemer the JN.1.18.4 + S:F1103S is now 13 due some Canadian clusters

on uSher there are 27 of this (13 on Gisaid): https://nextstrain.org/fetch/genome-test.gi.ucsc.edu/trash/ct/subtreeAuspice1_genome_test_1270e_bc7720.json?c=gt-nuc_27380&gmax=28380&gmin=26380&label=id:node_7144001

Screenshot 2024-06-14 alle 07 35 53

Query: T24870C,T23036C,T3565C