cov-lineages / pangolin

Software package for assigning SARS-CoV-2 genome sequences to global lineages.
GNU General Public License v3.0
427 stars 107 forks source link

Pangolin assigning lineage BA.2.12 to different phylogenetic clades #455

Closed gyebra-phs closed 2 years ago

gyebra-phs commented 2 years ago

Hi there,

I have noticed that in our data Pangolin is assigning the BA.2.12 lineage to two different clades:

image

(Of note, the child lineage BA.2.12.1, descended from one of those clades, is monophyletic.)

Both clades have the S:S704L mutation, presumably acquired independently by the "spurious" clade on the top? Sequence IDs for the top clade are:

QEUH-3DBC87E
QEUH-3DA4BF8
QEUH-3DD6683
QEUH-3DF0B57
QEUH-3DCE9D0
QEUH-3DE0808
QEUH-3DB4FEA
QEUH-3DC4AC8
QEUH-3DA3670
QEUH-3DA3B71
QEUH-3DA35CE
QEUH-3DA331F
QEUH-3DC5890
QEUH-3DE6D5E
QEUH-3DE6D7C
QEUH-3DF5730
QEUH-3DAF5FF
QEUH-3DF5123
QEUH-3DF4652
QEUH-3DF75BC
QEUH-3DC7728
QEUH-3DF419D
QEUH-3DF4FFD

Any thoughts would help, many thanks!

Gonzalo

AngieHinrichs commented 2 years ago

Yes, it does seem that S:S704L has occurred more than once in BA.2. If you paste your list of IDs into the UShER web interface, all of them are found in the brand-new lineage BA.2.37 (from cov-lineages/pango-designation#572) which is in pango-designation release v1.9. @aineniamh and I are working on updating the pangoLEARN model and UShER lineage tree to include the new v1.9 lineages, and hope to release those soon.

At the moment, the most recent pangolin-data release is v1.8, and the only BA.2 lineage with S:S704L as of v1.8 is BA.2.12, which probably explains why that lineage is assigned. But when pangolin-data v1.9 is released, you should see the top cluster's assignments change to BA.2.37.

gyebra-phs commented 2 years ago

Hi Angie,

This is super helpful information, thanks for replying so quickly! It makes sense, we'll wait for the new v1.9 and compare.

Thanks a lot for the hard work!

Gonzalo