cov-lineages / scorpio

serious constellations of reoccurring phylogenetically-independent origin
GNU General Public License v3.0
38 stars 4 forks source link

hundred XM genomes called as 'Unassigned' or as BA.1 #46

Closed akifoss closed 1 year ago

akifoss commented 2 years ago

Using the mutation profile mentioned in the pango-designation issue that assigned lineage XM (C15240T C2470T A18163G C19955T A20055G), the attached list of genomes did not get an XM lineage designation although they seem to belong to XM when looking at them with sc2rf. Would model training improve the lineage assignments or is this not in place anymore due to the upgrade to pangolin version 4?

Here's a sc2rf screenshot: image

Many thanks!

XM-notCalled-2022-05-02-GISAID.csv

corneliusroemer commented 2 years ago

Thanks for reporting this. I ran the sequences through Nextclade and Usher and indeed, most of these are good XM.

Only dozen or so are close but not quite XM (according to Usher).

https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_ba86_149cd0.json?c=pango_lineage_usher

image image
corneliusroemer commented 2 years ago

In particular all the German sequences are very likely real XM (XM is mostly German)

image
JosetteSchoenma commented 2 years ago

I have noticed the same thing last week. Only 4 out of 11 Dutch samples that were called XM by Usher were called XM by GISAID/pangolearn. They were either unassigned or n/a. The ones that were assigned correctly are from just after designation.

JosetteSchoenma commented 2 years ago

Screenshot_20220503-230816_Chrome.jpg And indeed, also in the world, most of these are unassigned.

SVN-PhD commented 2 years ago

I ran your sequences through pangolin (v4.0.6) on the command line with pangolin-data v1.8. I think this is another scorpio overwriting Usher calls issue.

Screenshot from 2022-05-04 07-35-56