cov-lineages / pango-designation

Repository for suggesting new lineages that should be added to the current scheme
Other
1.04k stars 97 forks source link

Proposal for a delta lineage with ORF7a:P45L (~56k sequences, ~33% prevalence in Russia), and 2 related sub-lineages #268

Closed c19850727 closed 2 years ago

c19850727 commented 2 years ago

Description Sub-lineage of: B.1.617.2, Clade 21J (~56K sequences, including a ~10K sub-clade where ORF7a:45L mutates back to 45P) Earliest sequence: 2021/4/15 (the UK) Most recent sequence: 2021/10/14 (Italy) Countries circulating: wide-spread, with higher prevalence in Russia and Eastern Europe

Cumulative prevalence and number of samples sequenced as per Outbreak.info: image https://outbreak.info/situation-reports?pango&muts=S%3AP681R&muts=ORF7a%3AP45L&muts=ORF1a%3AK261N

Mutations in addition to B.1.617.2, Clade 21J: ORF1a:K261N then ORF7a:P45L

Genomes: Delta P45L.csv

Evidence Downsiezed tree as per NeherLab (shown in yellow color): image https://nextstrain.org/groups/neherlab/ncov/europe?c=gt-ORF7a_45&f_clade_membership=21J%20%28Delta%29&label=spike_mutations:I95T&p=grid&r=division

Transmission advantage as per CoV-Spectrum: image https://cov-spectrum.ethz.ch/explore/World/AllSamples/AllTimes/variants/json=%7B%22variant%22%3A%7B%22mutations%22%3A[%22ORF1a%3AK261N%22%2C%22ORF7a%3AP45%22]%7D%2C%22matchPercentage%22%3A1%7D

image

Interestingly, big chunk of its sub-branch has the ORF7a:45L mutated back to 45P.

Sub-lineage of: B.1.617.2, Clade 21J (~9.2K sequences) Earliest sequence: 2021/4/15 (the UK) Most recent sequence: 2021/10/12 (Denmark) Countries circulating: wide-spread, with higher prevalence in France, Italy and Monaco.

Cumulative prevalence and number of samples sequenced as per Outbreak.info: image https://outbreak.info/situation-reports?pango&muts=S%3AP681R&muts=ORF7a%3AR118G&muts=ORF1a%3AK261N

Mutations in addition to B.1.617.2, Clade 21J: ORF1a:K261N then ORF7a:P45L then ORF7a:L45P, ORF7a:R118G and ORF1a:A498V

genomes: Delta R118G.csv

Evidence Downsized tree as per NeherLab (shown in light green): image https://nextstrain.org/groups/neherlab/ncov/europe?c=gt-ORF7a_45,118&f_clade_membership=21J%20%28Delta%29&label=spike_mutations:I95T&p=grid&r=division

Transmission advantage as per CoV-Spectrum: image https://cov-spectrum.ethz.ch/explore/World/AllSamples/AllTimes/variants/json=%7B%22variant%22%3A%7B%22mutations%22%3A[%22ORF1a%3AK261N%22%2C%22ORF7a%3AR118G%22%2C%22ORF1a%3AA498V%22]%7D%2C%22matchPercentage%22%3A1%7D

image

There is another much smaller sub-branch:

Sub-lineage of: B.1.617.2, Clade 21J (~913 sequences) Earliest sequence: 2021/4/23 (Russia) Most recent sequence: 2021/10/10 (Denmark) Countries circulating: Mainly Estonia (~16% prevalence), and a few other European countries

Cumulative prevalence and number of samples sequenced as per Outbreak.info: image https://outbreak.info/situation-reports?pango&muts=S%3AP681R&muts=S%3AS680P&muts=ORF1a%3AK261N

Mutations in addition to B.1.617.2, Clade 21J: ORF1a:K261N then ORF7a:P45L then S:S680P more than half of the sequences from Estonia have further substitutions of ORF1a:K851E, ORF1a:A903V, ORF1b:D815Y, ORF3a:S40P and nuc C6781T and (524 sequences, shown in blue color)

Genomes: Delta P45L+S680P.csv

Evidence Downsized tree as per NeherLab (shown in blue color): image https://nextstrain.org/groups/neherlab/ncov/estonia?c=gt-S_680&f_clade_membership=21J%20%28Delta%29&p=grid&r=division

Transmission advantage as per CoV-Spectrum: image https://cov-spectrum.ethz.ch/explore/World/AllSamples/AllTimes/variants/json=%7B%22variant%22%3A%7B%22mutations%22%3A[%22ORF1a%3AK261N%22%2C%22S%3AS680P%22]%7D%2C%22matchPercentage%22%3A1%7D

corneliusroemer commented 2 years ago

Can I suggest we make a parent sub-lineage with ORF1a:261N that catches the 2-3 proposals made by you @c19850727 in that branch? Otherwise we're screwing up the hierarchy and it looks like these related lineages are unrelated.

Let's call this lineage AY.N

Here you can see the ORF1a:261N branch on a global Neherlab build: image https://nextstrain.org/groups/neherlab/ncov/global?branchLabel=aa&c=gt-nuc_1048,27527,1758,2560&m=div&p=grid&r=division

I've colored by your proposed lineages.

  1. I agree that the biggest branch with ORF7a:P45L should be a sublineage, say AY.N.1 [dark green, top]
  2. Your second proposal is good too, with ORF7a:R118G and ORF1a:A498V [middle light-yellowish-green]. The reversion at ORF7a:45 is just a tree building error, most likely not biological, just ignore. This would be AY.N.2
  3. The third sub lineage of AY.N should be the branch with ORF3a:238Y, see screenshot below. This branch is discussed in #267. It's the orange one at the bottom. Should be AY.N.3.

I'm not sure whether the S:680P one should already be designated a sublineage of AY.N.3 as AY.N.3.1, one could do it I guess, if you're keen. Or wait.

Here's a zoom view of AY.N.3: image

PS @c19850727 general suggestion: the growth advantage is sometimes useful but if you post so many screenshots your issues get too long to be easily reviewed. Same applies for the outbreak.info country screenshot. For my feeling there's too many high images in your issues. A bit more structure and few screenshots could be good :)

MCB6 commented 2 years ago

Up to 1/3rd of sequences from Russia have already lost ORF7a:P45L, reverting to P (the uppermost branch on the screenshot, which is filtered to Country -> Russia). Nt 1758C is still the same. In this week's uploads from 4 regions of Russia, it 1758C accounts for 85-95% of the sequences (and it's similar with the most recent batches from Ukraine and Kazakhstan), but the reverted subbranch has up to 30% incidence. I understand that the sub-branches have many pending issues, but wouldn't it, at the very last, make sense to designated the higher-level branch, given how predominant it is in several large countries? ORF7a_P45

corneliusroemer commented 2 years ago

The reversion is in all likelihood a tree building error not biological. Actual reversions are much much rarer than tree building or sequencing artefacts. 1758C is part of Delta.

MCB6 commented 2 years ago

The reversion is in all likelihood a tree building error not biological. Actual reversions are much much rarer than tree building or sequencing artefacts.

There is a good reason to suspect a reversion because both sub-branches share NSP2:K81N which otherwise seems to be fairly rare, and has been mentioned in the pango issues just once, and both share geographies and get sequenced in the same labs. But of course it is possible that a sequencing artifact affects it selectively, possibly in a way related to sample uptake and preservation like with the recently-famous S:Y145H where the "proper" ARTIC primer fails, but the higher-quality RNAs still get some sequences from the next primer over...

Outside of Russia, say, in the UK, it is the same pattern where most ORS7a:P45L sequences also have NSP2:K81N, but some NSP2:K81N genomes have P rather than L.

corneliusroemer commented 2 years ago

This looks like totally normal tree building issues, nothing else

image

https://nextstrain.org/groups/neherlab/ncov/russia?branchLabel=aa&c=gt-nuc_1048,27527&m=div

MCB6 commented 2 years ago

normal tree building issues, nothing else

would it help with tree-building uncertainties if this substrain is designated? It gets a little bewildering now. In the new uploads from Ukraine, 7 genomes with NSP2:K81N + ORF7a:P45L are assigned to 4 different Pango lineages... Ukraine-NSP2_K81N

FedeGueli commented 2 years ago

I suggest the designation of the parental lineage with its sublineages, Estonia is actually the worst hit country in Europe and sublineage 3 proposed here is about 1/4 of the sequences there.

https://cov-spectrum.ethz.ch/explore/Estonia/AllSamples/Past3M/variants?aaMutations=S%3AS680P%2COrf7a%3AP45L&pangoLineage=B.1.617.2*

@chrisruis @corneliusroemer

MCB6 commented 2 years ago

Exactly the same lineage ambiguity is observed in today's upload from Russia. 40/42 samples collected in mid-October have NSP2:K81N + ORF7a:P45L, but they are assigned to 4 different lineages. Mostly AY.43 and B.1.617.2, but neither accounts for even half of the total... Orenburg_NSP2-K87N

FedeGueli commented 2 years ago

Designated as AY.122 in #320

FedeGueli commented 2 years ago

Hi @c19850727 here the second proposed sublineage of the newly designated AY.122 is still very interesting and should be proposed in a dedicated issue.

It is still growing in Denmark where it arrives at 3,3% of the last twenty days sequences

VariantTimeDistributionChart (6)

Core mutation: orf1a:3483F, Orf7a:R118G, Orf1a:A498V First sequences from Denmark: week 36 Total Sequences:1304 (1274 in Denmark)

Covspectrum

NexstNeher Lab Tree

@corneliusroemer