cov-lineages / lineages

Resources for calling and describing the circulating lineages of SARS-CoV-2
Other
37 stars 10 forks source link

This is not an issue #14

Closed rambaut closed 3 years ago

rambaut commented 3 years ago

This is not an issue but I can't work out how to delete it.

AngieHinrichs commented 3 years ago

I know this is a closed not-issue but I would like to point out that the authors of https://www.medrxiv.org/content/10.1101/2021.01.18.21249786v1.full.pdf also include ORF1a: I4205V and ORF1b:D1183Y ("Results: We detected a novel strain descended from cluster 20C and defined by five mutations (ORF1a: I4205V, ORF1b:D1183Y, S: S13I;W152C;L452R)(Figure 1)"). I wish they had asked you for a lineage assignment before making up "CAL.20C". :)

Pangolin lineages assigned to samples with the three S changes (S13I, W152C, L452R) vary quite a bit. Of 493 sequences with those 3 mutations downloaded yesterday from GISAID, despite a shared set of 13 nucleotide SNVs, the lineages assigned and counts were as follows:

320 B.1
 68 B.1.265
 18 B.1.324
 11 B.1.40
 11 B.1.370
 11 B.1.262
  9 B.1.288
  6 B.1.354
  4 B.1.368
  4 B.1.358
  3 B.1.320
  3 B.1.301
  3 B.1.263
  2 B.1.313
  2 B.1.298
  2 B.1.293
  2 B.1.292
  2 B.1.275
  2 B.1.2
  1 B.1.5
  1 B.1.361
  1 B.1.343
  1 B.1.336
  1 B.1.304
  1 B.1.300
  1 B.1.296
  1 B.1.289
  1 B.1.283
  1 B.1.266

Have you considered making Pangolin more closely tied to a phylogeny? Could I interest you in a regularly updated tree of COG-UK and GenBank/INSDC sequences that can be shared publicly, based on sarscov2phylo, with newer sequences added incrementally? :) (229,528 sequences as of today; in that tree, the mutations along the path from root (NC_045512 Wuhan/Hu-1) to the new lineage assigned by parsimony are C241T > C14408T > A23403G > C3037T > G25563T > C1059T > C28887T > G17014T, G21600T, G22018T, T22917G, C26681T, A28272T, C29362T > C2395T, A12878G, T24349C, G27890T including the ORF1's)