cov-lineages / pango-designation

Repository for suggesting new lineages that should be added to the current scheme
Other
1.04k stars 98 forks source link

Proposal to split B.1.1.529 in multiple sublineages #364

Closed 08011990 closed 2 years ago

08011990 commented 2 years ago

Submitted by Rakesh Sarkar (Senior Research Fellow, Division of Virology, ICMR-NICED, Kolkata, West Bengal, India; E. mail: rakeshsarkar133@gmail.com)

I have analyzed the S glycoprotein mutations of around 554 genome sequences of the Omicron variant which were deposited to GISAID from 33 different countries till 5th December, 2021 (Table 1). My analysis revealed the presence of 37 dominant mutations (A67V, ∆H69, ∆V70, T95I, G142D, ∆V143, ∆Y144, ∆Y145, ∆N211, L212I, ins214EPE, G339D, S371L, S373P, S375F, K417N, N440K, G446S, S477N, T478K, E484A, Q493R, G496S, Q498R, N501Y , Y505H, T547K, D614G, H655Y, N679K, P681H, N764K, D796Y, N856K, Q954H, N969K, and L981F) which range in the frequency from 44.22% to 100%, all across the S glycoprotein of the 554 Omicron variants (Table 2). However, 37 different mutations were found to be present in different combinations in different group of sequences. On the basis of coexisting mutations of the S glycoprotein I have classified 554 sequences into 95 different groups, each group representing a different set of S glycoprotein mutations (Table 3). Around 75% (412/554) of the Omicron variants formed three groups (Group 1, N=159; Group 2, N=114; Group 3, N=109). Group 1 contains all the 37 different mutations; Group 2 includes all the mutations except K417N, N440K, and G446S, whereas Group 3 harbours all the mutations of Group 2 except N764K. Rest of the 92 groups represented only 142 strains (Table 3). We have presented the sequence names of all the strains (with their S glycoprotein mutations) belonging to a specific group in Supplementary file 1.

I would request to go through the every details I provided in the attached files and give new lineage names of different groups accordingly.

Table 1.docx Table 2.docx Table 3.docx Supplementary File 1.xlsx

08011990 commented 2 years ago

I would request to give prime importance on group 1 to group 14 which have minimum 5 sequences, with special attention to group 1, group 2 and group 3 which include 159, 114 and 109 sequences respectively.

corneliusroemer commented 2 years ago

The differences between these groups are most likely sequencing artefacts due to amplicon dropout.

I do not see evidence for a split along those lines. If there was a split, we'd expect some correlated mutations in non-Spike areas.

This tree was built by masking all sites that seemed to have quality problems. If the S:440 split was real, it should cluster. But it doesn't, it's all over the tree.

image

https://nextstrain.org/groups/neherlab/ncov/21K-diversity/unmasked/?c=gt-S_440&gt=S.440N