Closed alurqu closed 2 years ago
Here the tree with some of the last WA sequences. https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/singleSubtreeAuspice_genome_1e871_67bc80.json?branchLabel=Spike%20mutations&c=gt-S_27&label=nuc%20mutations:T670G,C2790T,G4184A,C4321T,C9344T,A9424G,C9534T,C9866T,C10198T,G10447A,C12880T,T15240C,C15714T,C17410T,C19955T,A20055G,C21618T,T21762C,T21846C,T22200G,C22673T,A22688G,G22775A,A22786C,A24130C,C26060T,C26858T,G27382C,A27383T,T27384C,A29510C
Weird i cannot find S:27P
is this masked on Usher?
@AngieHinrichs @corneliusroemer @chrisruis @tompeacock
Addendum: Analyzing this issue i noticed that in Denmark 1/6 of BA.2 sequences have S:27A (reverted to wt). is this real or is it just backfilling to reference?
Just had a look to check this wasnt an artefact caused by different alignments of the out of codon sync deletion that BA.2 has - it does appear there is an additional nucleotide change (T21632C OR G21641C) though compared to normal BA.2 but I think analysis software is going to struggle to pick this up because its adjacent to the (lineage defining) deletion makes it ambiguous.
Running the sequences through Usher although a few do clsuter together overall they fall quite scattered throughout BA.2: I do wonder whether this might be some sort of sequencing/bioinformatics artefact still rather than a real mutation because of this lack of clustering. https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/singleSubtreeAuspice_genome_3419f_6a4240.json?c=userOrOld&label=nuc%20mutations:T670G,C2790T,G4184A,C4321T,C9344T,A9424G,C9534T,C9866T,C10198T,G10447A,C12880T,T15240C,C15714T,C17410T,C19955T,A20055G,C21618T,T21762C,T21846C,T22200G,C22673T,A22688G,G22775A,A22786C,A24130C,C26060T,C26858T,G27382C,A27383T,T27384C,A29510C
Weird i cannot find S:27P
is this masked on Usher?
Search for nuc:21632C instead and you'll find it. The BA.2 deletion is from 21633-21641, so the substitution (if it's not an artefact) is definitely 21632C and not 21641C. But Nextclade and cov-spectrum don't know this when they see the sequences, so they when they see BA.2 with 21632C they instead call it as a deletion from 21632-21640 with 21641C.
Usher correctly calls this as 21632C, but because this is in codon S:24 rather than S:27, it isn't interpreted correctly; you can't find it by searching for S:24 either because of the deletion. But the nucleotide change is still there!
This is what normal BA.2 shows:
And this is what Nextclade incorrectly shows for these sequences:
Addendum: Analyzing this issue i noticed that in Denmark 1/6 of BA.2 sequences have S:27A (reverted to wt). is this real or is it just backfilling to reference?
Those sequences are missing the 9-nucleotide deletion, which has the effect of reverting S:27 to WT. They're spread across multiple sublineages, so clearly not real.
thank you very much @silcn for the double explanation. . Now it makes sense!
Good explanation @silcn. I'll close this for now unless we have strong evidence this is not an artefact.
Thanks @thomasppeacock and @silcn. And just to confirm:
Weird i cannot find S:27P
is this masked on Usher?
yes, nucleotides 21633-21641 are masked in BA.2 in the UShER tree due to the BA.2 deletion and the general problem with some genome assembly pipelines reporting "substitutions" at deleted sites.
BA.2 sequences with additional mutation S:A27P (nuc G21641A) have appeared mostly in Washington State, USA with the first sequence 2022-02-07 and the most recent sequence from 2022-03-23. On Cov-Spectrum at the time of issue creation, 112 sequences show lineage BA.2 with 34 additional BA.2.3 and 1 additional BA.2.2 showing this mutation. Of the strictly-BA.2 sequences, 106 are from Washington State, 5 are from Finland, and 1 is from Denmark. Of the broader BA.2* set, 32 are from Washington State and 3 are from Finland.
The Cov-Spectrum URL for the narrower set is https://cov-spectrum.org/explore/World/AllSamples/AllTimes/variants?aaMutations=S%3A27P&pangoLineage=BA.2&, and the Cov-Spectrum URL for the broader set is https://cov-spectrum.org/explore/World/AllSamples/AllTimes/variants?aaMutations=S%3A27P&pangoLineage=BA.2*&
The S:A27P mutation has occurred 46 times scattered across other lineages but never at this frequency.
For all NCBI GenBank sequences accessible through Cov-Spectrum, S:A27P occurs in 0.0037%. However, for sequences since 01 March 2022, S:A27P occurs in 0.24% for a more than 64x frequency increase in recent sequences.
Note: any coaching regarding additional vetting of this and other potential sublineages, such as how to properly check for a monophyletic clade, will be appreciated.
CoV-Spectrum sequence lists for the narrower and broader sets are attached. BA.2+S_A27P-cov-spectrum-contributors.csv BA.2star+S_A27P-cov-spectrum-contributors.csv