cov-lineages / pango-designation

Repository for suggesting new lineages that should be added to the current scheme
Other
1.04k stars 98 forks source link

BA.5 sublineage with ORF1a:I3758V mainly in genomes from Germany #841

Closed akifoss closed 1 year ago

akifoss commented 2 years ago

Sub-lineage of: Potential new sublineage of BA.5 Earliest sequence: EPI_ISL_13351624 (06.04.2022, Indonesia) Most recent sequences: EPI_ISL_13662398, EPI_ISL_13662402, EPI_ISL_13662397, EPI_ISL_13662399, EPI_ISL_13662401 (25.06.2022, Netherlands) Defining mutation: ORF1a:I3758V

Countries circulating 393 Germany 84 Netherlands 51 France 3 Denmark 1 South Africa 1 Singapore 1 Italy 1 Indonesia 1 Eswatini 1 Belgium

Sublineage composition 208 BA.5.1 142 BE.1 46 BA.5.2.1 43 BA.5.3.2 34 BA.5.3 29 BA.5 20 BA.5.2 7 BF.1 4 BA.5.5 4 BA.5.3.1

Genomes

BA5sublin-GISAID-ORF1a_I3758V-2022-07-13.txt

Evidence

Genomes with this mutation are sitting on a long branch in a one-time constructed BA.5* phylogeny (top left corner):

image

https://cov-spectrum.org/explore/Europe/AllSamples/Past6M/variants?aaMutations=ORF1a%3AI3758V&pangoLineage=BA.5*&aaMutations1=ORF1a%3AI3758V&pangoLineage1=BA.5*&

image

image

AngieHinrichs commented 2 years ago

Caution... I've noticed that almost exclusively in Germany/RKI sequences, A11537G (ORF1a:I3758V) and T11524C occur together really frequently, but in sequences whose other mutations map them to many different branches of BA.2 and BA.5. In other words, many different BA.2 and BA.5 lineages (and undesignated branches of BA.2) include a cluster of German sequences with A11537G and T11524C. I wonder if there's some kind of systematic error behind that.

To illustrate a few of the many places I see this in the tree, here are Taxonium views, colored by country (darker green = Germany, bright green = Netherlands, purple = France), with blue circles around nodes with a change at 11537 (min 10 samples) and green circles around nodes with a change at 11524 (min 10 samples):

BA.5.3.2 (the second green circle is a reversion on 11524):

image

BE.1.1 (BA.5.3.1.1.1):

image

BA.5.1:

image

BA.5.2.1:

image

BA.2 + C25416T + C8092T + C2062T + C29420T:

image

BA.2 + C22792T + G3692T + A25927G + C952T:

image

... and many more, you get the idea.

I'd be tempted to mask those two positions except A11537G is a BA.1 mutation and matters when detecting BA.1 / BA.2 recombinants, and some of those A11537G's could certainly be real. I just think it's suspicious that A11537G and T11524C appear together in so many different branches, almost always from the same country (although TBF Germany does seem to have contributed an outsized share of BA.2 and BA.5 sequences, especially since UK testing dropped off). Here's a zoomed-out country-colored view of BA.2 (and nested BA.5):

image
FedeGueli commented 2 years ago

@AngieHinrichs i think ORF1a:I3758V is a known sequencing issue in Germany, if i dont recall badly @josetteshoenma digged a bit on it cause it appeared also in Netherlands and they solved it.

AngieHinrichs commented 2 years ago

@AngieHinrichs i think ORF1a:I3758V is a known sequencing issue in Germany, if i dont recall badly @josetteshoenma digged a bit on it cause it appeared also in Netherlands and they solved it.

Wow, that's great, @JosetteSchoenma maybe you can help the RKI folks solve it too! 🙂

JosetteSchoenma commented 2 years ago

@AngieHinrichs @CoolenJordy actually fixed this, after we discussed it. He mentions it in his GitHub issue here. https://github.com/JordyCoolen/easyseq_covid19/releases/tag/v0.9

Are you maybe in contact with the RKI?

JosetteSchoenma commented 2 years ago

Most Dutch sequences are from Microvida. I will try and contact somebody from that lab. Note for me: ORF1a:I3758V = NSP6_I189V on GISAID.

AngieHinrichs commented 2 years ago

Ah, thanks @JosetteSchoenma (and @JordyCoolen)! Looks like this commit changed a primer trimming region to start at 11520 instead of 11525, which would affect position 11524 too I suppose: https://github.com/JordyCoolen/easyseq_covid19/commit/e2412313ddaaf39a0b8014e4209d01c32a4d3245 Please do pass that on to anyone you know at RKI and I'll see if I can find contact info for someone there too! Thanks!

MarieLataretu commented 2 years ago

Thanks to all for the input!

The RKI is in contact with the labs, see https://github.com/robert-koch-institut/SARS-CoV-2-Sequenzdaten_aus_Deutschland/issues/27#issuecomment-1201352032

We were able to solve that with https://github.com/JordyCoolen/easyseq_covid19/commit/e2412313ddaaf39a0b8014e4209d01c32a4d3245!