Closed rambaut closed 3 years ago
I know this is a closed not-issue but I would like to point out that the authors of https://www.medrxiv.org/content/10.1101/2021.01.18.21249786v1.full.pdf also include ORF1a: I4205V and ORF1b:D1183Y ("Results: We detected a novel strain descended from cluster 20C and defined by five mutations (ORF1a: I4205V, ORF1b:D1183Y, S: S13I;W152C;L452R)(Figure 1)"). I wish they had asked you for a lineage assignment before making up "CAL.20C". :)
Pangolin lineages assigned to samples with the three S changes (S13I, W152C, L452R) vary quite a bit. Of 493 sequences with those 3 mutations downloaded yesterday from GISAID, despite a shared set of 13 nucleotide SNVs, the lineages assigned and counts were as follows:
320 B.1
68 B.1.265
18 B.1.324
11 B.1.40
11 B.1.370
11 B.1.262
9 B.1.288
6 B.1.354
4 B.1.368
4 B.1.358
3 B.1.320
3 B.1.301
3 B.1.263
2 B.1.313
2 B.1.298
2 B.1.293
2 B.1.292
2 B.1.275
2 B.1.2
1 B.1.5
1 B.1.361
1 B.1.343
1 B.1.336
1 B.1.304
1 B.1.300
1 B.1.296
1 B.1.289
1 B.1.283
1 B.1.266
Have you considered making Pangolin more closely tied to a phylogeny? Could I interest you in a regularly updated tree of COG-UK and GenBank/INSDC sequences that can be shared publicly, based on sarscov2phylo, with newer sequences added incrementally? :) (229,528 sequences as of today; in that tree, the mutations along the path from root (NC_045512 Wuhan/Hu-1) to the new lineage assigned by parsimony are C241T > C14408T > A23403G > C3037T > G25563T > C1059T > C28887T > G17014T, G21600T, G22018T, T22917G, C26681T, A28272T, C29362T > C2395T, A12878G, T24349C, G27890T including the ORF1's)
This is not an issue but I can't work out how to delete it.