cov-lineages / pango-designation

Repository for suggesting new lineages that should be added to the current scheme
Other
1.04k stars 97 forks source link

BA.5.2.24 with C23707T ( formerly placed as BA.5.2.20 sublineage with S:K444N ) now counting 57 Sequences with a small cluster with S:N460K (T22942A) -3 seqs #1122

Closed FedeGueli closed 1 year ago

FedeGueli commented 1 year ago

SEE LAST COMMENT USHER TREE IS CHANGED NOW (9/10/22)

Here i want to propose a sublineage of the recently designated BA.5.2.20 lineage (that has Orf1b:1050N).

It is defined by S:K444N mutation, it stems out directly after the BA.5.2.20 defining NUC :C23707T .

it counts 29 sequences as today from 11 countries and 5 continents https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice7_genome_14417_1ac6e0.json?branchLabel=aa%20mutations&c=pango_lineage_usher&label=nuc%20mutations:G22894T

Covspectrum query: BA.5.2 (Nextclade) + G12310A, C23707T, C14649C, C11704C + S:K444N https://cov-spectrum.org/explore/World/AllSamples/Past6M/variants?aaMutations=S%3AK444N&nucMutations=G12310A%2CC23707T%2C14649C%2C11704C&nextcladePangoLineage=BA.5.2&aaMutations1=S%3A153I%2CS%3A1258Q%2CN%3A151L&

Sequence list: contributors.csv

a little branch with 3 sequences have acquired S:N460K (T22942A) , N:S327L, Orf1a:K669N

Schermata 2022-09-26 alle 16 55 58

Gisaid query for this last little cluster: Spike_N460K, N_S327L,Spike_K444N

corneliusroemer commented 1 year ago

Looks like it's worth designating the outer one with 29 sequences as BV.X

image
corneliusroemer commented 1 year ago

Let's designate the outer one with S:444N as BV.3

FedeGueli commented 1 year ago

One more sequence total now should be 30 but i revised the little cluster with S:460K they are two not three, so 29 is correct.

FedeGueli commented 1 year ago

As noted in #1089 cc @AngieHinrichs: it seems that this entire sublineage is now placed under BA.5.2.24: https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_8344_2f24c0.json?branchLabel=nuc%20mutations&c=gt-nuc_23707&label=nuc%20mutations:G22894T

Schermata 2022-10-09 alle 23 10 44

and it would count 57 sequences as today

corneliusroemer commented 1 year ago

This may well still be a BA.5.2.20, what makes you think it's BA.5.2.24? This looks like homoplasy tree builder error?

AngieHinrichs commented 1 year ago

I don't think it's necessarily an error, just an ambiguous situation.

BA.5.2.20 is larger than BA.5.2.24 (~12k vs. ~280 excluding the new cluster) so arguably BA.5.2.20 is the more likely parent, assuming even sampling rates. One could also look at dates & geographical locations to try to determine which is more likely, although it's often still ambiguous. I thought usher/matOptimize had a tiebreaker based on number of descendants so I don't understand why matOptimize would move the cluster from BA.5.2.20 to BA.5.2.24.

Whichever you end up choosing between .20 and .24, it shouldn't affect lineage assignment for new sequences because the set of mutations in the new cluster is unambiguous.

corneliusroemer commented 1 year ago

This one is actually already designated as BV.2! For some reasons it seems to be missing from Usher? @AngieHinrichs

image

I had made a mistake and forgotten to add the sequences in the designation commit, adding them as a fix later.

AngieHinrichs commented 1 year ago

This one is actually already designated as BV.2! For some reasons it seems to be missing from Usher?

Well crud, I just missed it. I need to automate checking that I have manually annotated all of the multi-letter lineages.

BV.2 missed the boat for pangolin-data v1.15.1 usher-mode (except the ones that will be caught by the designation hash). Looks like the sequences are assigned BA.5.2.24 (with S:K444N so at least that's flagged) by pangolin in usher mode.

corneliusroemer commented 1 year ago

So be it!

You could try to use this csv to verify the latest lineages are all included, should be doable with a simple script?

Something else that may help: is it possible to disable designation from designation hash? It would be a simple test to check whether the designated sequences are correctly assigned with designation hash off :)

On 12 Oct 2022, at 00:57, Angie Hinrichs @.***> wrote:

This one is actually already designated as BV.2! For some reasons it seems to be missing from Usher?

Well crud, I just missed it. I need to automate checking that I have manually annotated all of the multi-letter lineages.

BV.2 missed the boat for pangolin-data v1.15.1 usher-mode (except the ones that will be caught by the designation hash). Looks like the sequences are assigned BA.5.2.24 (with S:K444N so at least that's flagged) by pangolin in usher mode.

— Reply to this email directly, view it on GitHub https://github.com/cov-lineages/pango-designation/issues/1122#issuecomment-1275375370, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF77AQOJE5EDMMBE2GGBH2TWCXWFTANCNFSM6AAAAAAQV4I5IM. You are receiving this because you modified the open/close state.

AngieHinrichs commented 1 year ago

Yes, simple script, and yes, pangolin has a --skip-designation-cache option.