Closed agamedilab closed 2 years ago
Haven't looked at this in detail but high clade diversity points at artefact
Please always add a covSPECTRUM query to facilitate monitoring and analysis of the proposal
@corneliusroemer many thanks! I usually add covspectrum queries (i admittely forget to add the link in #724) when possible...
ORF1ab:A3436D is NSP5_A173D, which is adjacent to a nucleotide whose mutation (NSP5_H172Y) is known to confer very high resistance to Paxlovid treatment (233-fold reduced nirmatrelvir activity in cell culture). https://www.fda.gov/media/155050/download
@corneliusroemer artefacts in German sequences would be nothing new - reminds me of those ones with a bunch of S mutations around 1007-1014 which still keep showing up
Hm... i found now a batch of 22 seqs of the same RKI upload (13.06.22) carrying very much the same set of mutations but belonging to BA.2:
I now also think this probably too much of a coincident. Having that set of muations appearing in two lineages uploaded within the same batch is very suspicous...
As a final nail in the coffin, note that many of the AA mutations within the apparent clade are either reversions of BE.1 mutations or are associated with other BA.5 lineages (N:D136E, ORF1a:K556Q, ORF10:L37F...). It looks like sequences from all across the tree have been grouped together by Usher because of a shared set of artefactual mutations.
many thanks, i think i can close this
I noticed that the bigger branch with Orf1a:L3829F now in Usher appears as BE.1.1, the usher tree you posted here is now unavaible so i cant see if this new designated lineage includes these probably artefactual sequences or not. Maybe this has to be checked @InfrPopGen
@FedeGueli Hi , yes BE.1.1 now apparently includes those probably artefactual sequences:
Thx @agamedilab
I noticed the other day when looking into sequences whose pangolin/UShER lineage assignment will change in the next release that many sequences from Germany have nt:T11524C and A11537G (ORF1a:I3758V) together despite appearing in very different branches of BA.2 / BA.5. Since that only seems to be the case for sequences from Germany I suspect some kind of systematic error. [In case A11537G (ORF1a:I3758V) sounds familiar, it is found in BA.1, but not T11524C.]
Proposal for a sublineage of BE.1 Earliest sequence: 25.05.2022 (Germany) Countries detected: Germany
Defining mutations: S:P384H, S:T604N, ORF1ab:Q1003K, ORF1ab:Q2510K, ORF1ab:A3436D, ORF1ab:I3758V, ORF1ab:L3829F, ORF6:T10N, ORF7b: I27F, N:A90D
This is apparently a highly divergent sublineage of BE.1 carrying a total of 10 additonal aa mutations compared to BE.1:
Two spike mutations (S:P384H, S:T604N), five ORF1ab mutations (ORF1ab:Q2510K, ORF1ab:A3436D, ORF1ab:I3758V, ORF1ab:L3829F) and one ORF6 and ORF7b and N mutation, respectively (ORF6:T10N, ORF7b: I27F, N: A90D). This cluster of sequences (c. 30) was only detected in Germany so far.
https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_1008d_c38c70.json?branchLabel=aa%20mutations&c=gt-S_384&label=nuc%20mutations:G16935A
The intra clade diversity is also very high with almost each sequence carrying addtional aa mutations:
https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_1008d_c38c70.json?branchLabel=aa%20mutations&c=country&label=nuc%20mutations:C3272A,C7793A,C10572A,C16611A,C22713A,C27230A,C28542A
Sequence quality is apparently ok:
EPI_ISL_13241291, EPI_ISL_13241313, EPI_ISL_13241342, EPI_ISL_13241298, EPI_ISL_13241302, EPI_ISL_13241303, EPI_ISL_13241255, EPI_ISL_13241273, EPI_ISL_13241343, EPI_ISL_13241316, EPI_ISL_13241276, EPI_ISL_13241260, EPI_ISL_13241269, EPI_ISL_13241238, EPI_ISL_13241246, EPI_ISL_13241268, EPI_ISL_13241271, EPI_ISL_13241253, EPI_ISL_13241293, EPI_ISL_13241286, EPI_ISL_13241262, EPI_ISL_13241344, EPI_ISL_13241240, EPI_ISL_13241325, EPI_ISL_13241325, EPI_ISL_13241337, EPI_ISL_13241264, EPI_ISL_13241305, EPI_ISL_13241261, EPI_ISL_13241245, EPI_ISL_13241314