cov-lineages / pango-designation

Repository for suggesting new lineages that should be added to the current scheme
Other
1.03k stars 97 forks source link

Sublineage(s) of XBB* with S:H146K (~135 seq, ~19 countries) #1330

Closed ryhisner closed 1 year ago

ryhisner commented 1 year ago

Description

Sub-lineage of: XBB Earliest sequence: 2022-9-9, India, Tamil Nadu — EPI_ISL_15146813 Most recent sequence: 2022-11-4, Denmark & Iceland — EPI_ISL_15716055, EPI_ISL_15740717 Countries circulating: About 19 countries Number of Sequences: About 135 GISAID Query: Spike_H146K, Spike_Q183E, Spike_V213E CovSpectrum Query: Nextcladepangolineage:XBB* & S:H146K *Substitutions on top of XBB: Spike: H146K Nucleotide:** C21998A

USHER Tree All of the sequences in black on the tree pictured below have S:H146K according to GISAID (and according to Nextclade for the majority). https://nextstrain.org/fetch/raw.githubusercontent.com/ryhisner/jsons/main/XBB%20%2B%20H146K%20-%20subtreeAuspice1_genome_248f7_55ec0.json

image

Evidence A search for the two-nucleotide mutation S:H146K along with other XBB mutations returns about 135 sequences. NextClade confirms most of these though a red line runs through about one-third of them. S:H146K appears to be entirely blocked by Usher, so it appears nowhere on the XBB tree. I know that a lot of sequences don’t deal with deletions particularly well, and S:Y144del is common in XBB and located close to the mutations S:G142D and S:H146K. I’m guessing this is why NextClade puts a red line through almost the entire spike of so many XBB* + H146K sequences, though I don't know for sure.

image

Perhaps all the sequences that appear to have S:H146K are really sequencing artifacts that for some reason happen in different laboratories all over the world. I don't know enough to say for sure. @AngieHinrichs, perhaps you could look into this and determine what's really going on here?

Genomes

Genomes EPI_ISL_15146813, EPI_ISL_15231913, EPI_ISL_15250343, EPI_ISL_15250538, EPI_ISL_15297641, EPI_ISL_15299955, EPI_ISL_15312872, EPI_ISL_15341114, EPI_ISL_15363991, EPI_ISL_15401657, EPI_ISL_15452687, EPI_ISL_15452794, EPI_ISL_15465060, EPI_ISL_15469817, EPI_ISL_15469849-15469850, EPI_ISL_15469853, EPI_ISL_15469890, EPI_ISL_15469939, EPI_ISL_15469947, EPI_ISL_15469965, EPI_ISL_15476137, EPI_ISL_15494799, EPI_ISL_15494822, EPI_ISL_15509864, EPI_ISL_15517434, EPI_ISL_15553030, EPI_ISL_15573095, EPI_ISL_15573107, EPI_ISL_15573123, EPI_ISL_15573170, EPI_ISL_15579578, EPI_ISL_15580307, EPI_ISL_15602402, EPI_ISL_15613293, EPI_ISL_15617404, EPI_ISL_15626942, EPI_ISL_15629123, EPI_ISL_15634417, EPI_ISL_15634733, EPI_ISL_15634815, EPI_ISL_15634879, EPI_ISL_15635865, EPI_ISL_15641234, EPI_ISL_15641241, EPI_ISL_15641625, EPI_ISL_15641667, EPI_ISL_15648075, EPI_ISL_15649951, EPI_ISL_15654601, EPI_ISL_15665312, EPI_ISL_15665346, EPI_ISL_15665779, EPI_ISL_15668141, EPI_ISL_15669040, EPI_ISL_15671554, EPI_ISL_15671741, EPI_ISL_15674203, EPI_ISL_15677629, EPI_ISL_15677732, EPI_ISL_15684760, EPI_ISL_15684765, EPI_ISL_15686577, EPI_ISL_15695769, EPI_ISL_15696796, EPI_ISL_15696825, EPI_ISL_15697700, EPI_ISL_15697761, EPI_ISL_15707108, EPI_ISL_15707546, EPI_ISL_15707592, EPI_ISL_15713724, EPI_ISL_15716055, EPI_ISL_15716301, EPI_ISL_15716502, EPI_ISL_15720017, EPI_ISL_15722254, EPI_ISL_15724331, EPI_ISL_15725037, EPI_ISL_15726756, EPI_ISL_15726947, EPI_ISL_15732266, EPI_ISL_15732784, EPI_ISL_15736003, EPI_ISL_15736432, EPI_ISL_15736776-15736777, EPI_ISL_15740717, EPI_ISL_15740924, EPI_ISL_15744615, EPI_ISL_15745901, EPI_ISL_15746919, EPI_ISL_15747049, EPI_ISL_15748998, EPI_ISL_15749000, EPI_ISL_15749011, EPI_ISL_15749033, EPI_ISL_15749037, EPI_ISL_15749123, EPI_ISL_15749198, EPI_ISL_15749207, EPI_ISL_15749211, EPI_ISL_15749219, EPI_ISL_15749361, EPI_ISL_15749428, EPI_ISL_15749547, EPI_ISL_15749549, EPI_ISL_15749554, EPI_ISL_15749563, EPI_ISL_15749566, EPI_ISL_15749586, EPI_ISL_15749610, EPI_ISL_15749670, EPI_ISL_15749678, EPI_ISL_15749683, EPI_ISL_15749691, EPI_ISL_15749765, EPI_ISL_15749768, EPI_ISL_15749787, EPI_ISL_15749801, EPI_ISL_15749815, EPI_ISL_15749853-15749854, EPI_ISL_15749883, EPI_ISL_15749885, EPI_ISL_15749896, EPI_ISL_15749902, EPI_ISL_15749914, EPI_ISL_15749949, EPI_ISL_15749958, EPI_ISL_15749971, EPI_ISL_15749984, EPI_ISL_15749995, EPI_ISL_15750008, EPI_ISL_15754381, EPI_ISL_15754431
AngieHinrichs commented 1 year ago

S:H146K appears to be entirely blocked by Usher

Sorry, this is a bug in the UShER web interface's conversion of nucleotide mutations to amino acid mutations. It's not considering both mutations in the same codon, only the most recent one, so it incorrectly shows the amino acid change for C21988A as N (ignoring that the codon already has C22200A): https://nextstrain.org/fetch/raw.githubusercontent.com/ryhisner/jsons/main/XBB%20%2B%20H146K%20-%20subtreeAuspice1_genome_248f7_55ec0.json?c=gt-S_146&label=nuc%20mutations:G22895C,T22896C,T23031C

Perhaps all the sequences that appear to have S:H146K are really sequencing artifacts that for some reason happen in different laboratories all over the world.

Unfortunately, I get the impression that genome assembly pipelines that have suboptimal handling of indels, especially in the presence of even very low levels of contamination, have infected labs all over the world. 🙁 By personal communication with one of the labs that contributed some of those sequences in your list, as of spring 2021, they were aware that they might not be calling deletions properly, but just did not have time to deal with it (high volume of sequences, not a lot of staff).

These sequences are spread all over XBB and its sublineages. This is what it looks like in Taxonium with the full tree's XBB branch, with red circles around your sequences and samples colored by S:146 (cyan=Q, darker blue=K, light green=H (reversion)):

image

Since the sequences look pretty evenly distributed throughout XBB, I doubt this is a lineage -- I think it is more likely to be a symptom of genome assembly issues / artefacts. But if someone who knows more about genome sequencing and assembly has time to look at raw data for some of those sequences, I'll defer to whatever they find. 🙂

FedeGueli commented 1 year ago

@ryhisner is this the same that has been designated XBB.1.6 ? https://github.com/cov-lineages/pango-designation/commit/e5d63041b9a893d244ae90a32900dc39ecadb41b

silcn commented 1 year ago

@FedeGueli it's looking increasingly clear, at least to me, that S:H146K is not an artefact but just extremely homoplasic in XBB*. XBB.1.6 is just one of many branches with this mutation.

FedeGueli commented 1 year ago

Thx @silcn !