cov-lineages / pango-designation

Repository for suggesting new lineages that should be added to the current scheme
Other
1.04k stars 97 forks source link

Potential Sublineage of AY.39.1 with Reverse Mutation G13482A (~600 Seqs, ~4% in Australia) #341

Closed c19850727 closed 2 years ago

c19850727 commented 2 years ago

This is an effort to fine-tune AY.39.1, the currently predominant lineage in Australia (~90% prevalence).

According to NeherLab: image

There are 4 subclades that are visibly distinct:

Here I would like to submit the subclade with nuc T28921C, nuc G13482A, nuc G29179T and ORF1b:I527V. Noticeably, nuc G13482A is the reverse mutation from the lineage-defining mutation of AY.39.1, therefore those sequences are currently classified as AY.39 by Pango and Usher.

Description Sub-lineage of: AY.39.1 Earliest sequence: 2021/8/10 (Australia-NSW) Most recent sequence: 2021/11/12 (Australia-NSW) Countries circulating: Australia (NSW)

Prevalence in each state as per Cov-spectrum is as follows: image

Transmission advantage as per Cov-spectrum: image

Usher tree (in green color): image https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_107a6_c80d50.json?branchLabel=Spike%20mutations&c=gt-nuc_13482&l=radial

corneliusroemer commented 2 years ago

Are you sure that the reversion isn't just a tree building error or sequencing artefact? So far, all reversions I've seen, and there are hundreds of them, have been tree building/sequencing artefacts.

AngieHinrichs commented 2 years ago

I agree... this is how the path to the cluster ends in the UCSC/UShER tree:

... > G27604A > G21372T > A13482G > T28921C > G13482A

The back-mutation on 13482 is almost immediate, separated by only one mutation -- probably an error. The numbers of descendants of those nodes in the 2021-11-19 tree are

A13482G: 23613
T28921C:  5705
G13482A:   639

So there are >5000 sequences that have all the mutations up to G21372T, A13482G, and T28921C (or Ns)... but there are also 639 sequences that have the mutations up to G21372T and T28921C (but not G13482A). From a maximum parsimony point of view, I believe G27604A > G21372T > A13482G > T28921C > G13482A scores the same as G27604A > G21372T > {T28921C , A13482G > T28921C} (5 mutations either way) -- but since some important inferences are made based on the presence of mutations in these paths, we need to find a way to make back-mutations less common than they are currently. Tagging @russcd and @yatisht - could this be helped by modifying the tie-breakers?

yatisht commented 2 years ago

This is a straightforward tie-breaking strategy to incorporate. I'll add it in later this week. Thanks for the analysis.

chrisruis commented 2 years ago

We haven't designated this at this stage due to the discussion above. Thanks @c19850727 for submitting