cov-lineages / pango-designation

Repository for suggesting new lineages that should be added to the current scheme
Other
1.04k stars 98 forks source link

BA.2.10.1 sublineage with S:V83A, 144del, H146Q, Q183E, V213E, G339H, R346T, L368I, V445P, G446S, V483A, F490V, S1003I (11 seq, West Bengal and 1 in New York) #915

Closed silcn closed 2 years ago

silcn commented 2 years ago

Proposal for a sublineage of BA.2.10 Earliest sequence: 2022-07-02 Countries detected: India (9 seq, all from West Bengal)

Mutations in addition to BA.2.10: S:V83A, H146Q, Q183E, V213E, G339H, R346T, L368I, V445P, G446S, V483A, F490V, G798D, S1003I ORF1a:K47R ORF1b:G662S M:D3Y ORF7a:I110T N:T282I nuc:C15738T, T15939C, T17859C, G28079T

S:V445P is a 2-nucleotide mutation (GTT -> CCT).

S:G798D is shared with BA.2.10.1, but Usher places this on a branch of BA.2.10 with C15738T. I think the order may actually be ambiguous, but the fact that the BA.2.10+C15738T sequences are mostly from India makes me think that Usher has probably got it right.

This lineage has so far only been detected in the state of West Bengal, India, but I think it deserves a proposal based on its exceptional total of 13 spike mutations, including 7 in the RBD. 346, 445, 446 and 490 are significant escape locations in the Bloom lab calculator, in particular knocking out a large fraction of the antibodies elicited by pre-Omicron virus that still neutralise BA.2. Somewhat surprisingly, S:Q493R is not reverted!

Also note that this is yet another highly divergent BA.2 with ORF1b:G662S, after BA.2.75 and the BA.2.10 sublineage in #898 (see the discussion in that issue).

Usher tree has some reversions in the RBD which are most likely artefacts. The two sequences before the S:L368I branch have NNNs there so I assume they also have S:L368I.

India_V445P

https://nextstrain.org/fetch/github.com/silcn/subtreeAuspice1/raw/main/auspice/subtreeAuspice1_genome_2019f_10ae20.json?branchLabel=Spike%20mutations&c=gt-S_1003&label=nuc%20mutations:C15738T

Genomes: EPI_ISL_14166909 EPI_ISL_14167044 EPI_ISL_14302968 EPI_ISL_14303108 EPI_ISL_14303176 EPI_ISL_14303181 EPI_ISL_14303196 EPI_ISL_14303277 EPI_ISL_14303283

Improved cov-spectrum query from @corneliusroemer: https://cov-spectrum.org/explore/World/AllSamples/Past6M/variants?variantQuery=%5B4-of%3A+ORF1a%3A47R%2C+S%3A83A%2C+S%3A146Q%2C+S%3A213E%2C+S%3A339H%2C+S%3A445P%2C+S%3A483A%2C+S%3A1003I%2C+M%3A3Y%2C+ORF7a%3A110T%2C+N%3A282I%2C+15738T%2C+15939C%5D&

silcn commented 2 years ago

Now found in the USA (New York): EPI_ISL_14305185

cvejris commented 2 years ago

This looks really bad... In addition to the most striking mutations that @silcn mentioned (ORF1b:G662S, S:V445P), I would like to point out two more: S:G339H thus far only present in BA.2.75; is a double-substitution (the default in Omicrons is G339D). M:D3Y - this position is strongly parallel in Omicrons - BA.1 has D3G, BA.5 has D3N

FedeGueli commented 2 years ago

agree @cvejris M:3Y is very "intriguing" as we know that position to can give some effect but definetely we dont know anything of that.

thomasppeacock commented 2 years ago

Going to recommend this gets assigned along with #898 due to the mutational profiles, widespread geographical spread, apparent rapid expansansion (at least as far as can be concluded from a dozen sequences) and second generation nature of these.

corneliusroemer commented 2 years ago

I agree that this is worth of immediate designation based on the Spike profile and given that it has already been sequenced in New York far away from West Bengal.

To analyse these sequences, I suggest to use the dataset "SARS-CoV-2 relative to BA.2" on master.clades.nextstrain.org (this is an experimental dataset, hence not available on main Nextclade). But given that only mutations relative to BA.2 are shown - it's much easier to spot what's going on in the RBD in the mutation view (unfortunately deletions present in BA.2 do show in this view as well since otherwise the reference sequence would be in different coordinates).

Here you can see the new mutations with respect to BA.2 - there are a lot.

Ignore the S:27 mutation - this is because of the BA.2 deletion, the "redness" = apparent bad quality is because there are so many private mutations. The sequences are clean - yes there may be a few reversion but overall this is not an artefact. Especially given the NY sequence being such a good fit for the WB ones.

image

This is for the NY sequence:

image

I suggest to use the following covSpectrum query that is a bit more sensitive for partial sequences (it catches all the 3 that are already on covSpectrum, 7 of the 10 sequences were uploaded to GISAID just today so it will take up to 3 more days for them to show up on covSpectrum): https://cov-spectrum.org/explore/World/AllSamples/Past6M/variants?variantQuery=%5B4-of%3A+ORF1a%3A47R%2C+S%3A83A%2C+S%3A146Q%2C+S%3A213E%2C+S%3A339H%2C+S%3A445P%2C+S%3A483A%2C+S%3A1003I%2C+M%3A3Y%2C+ORF7a%3A110T%2C+N%3A282I%2C+15738T%2C+15939C%5D&

In order to query on GISAID, I suggest to use: M_D3Y,N_T282I

This gives me all the 10 sequences that I got through a broader search using just M_D3Y then filtering using Nextclade clade assignments.

silcn commented 2 years ago

@corneliusroemer oh excellent spot of EPI_ISL_14175734 with your cov-spectrum query. I'd completely missed it because almost all the mutations are covered by NNNs, but it looks like it belongs. So there are 11 sequences including that one. "M_D3Y, Spike_V83A" is currently the only GISAID query that finds them all, but "M_D3Y, N_T282I" will probably be better in the long term.

corneliusroemer commented 2 years ago

Oh I must have missed one - not sure how since I used just M_D3Y at some point.

I guess the very best method to be up to date and not days behind due to GISAID -> covSpectrum/Usher delay is to query GISAID for recent M_D3Y (plus maybe try another mutation in case that one is dropout), then run through Nextclade and pick the ones that are in BA.2.10 [this is reminiscent of filtration in approximate pattern matching]

ryhisner commented 2 years ago

Worth noting that N:T282I is also found in BG.2. I had a bad feeling about this one when the first two were uploaded on the same day from separate patients last week. The next few months are going to be very, very interesting.

ryhisner commented 2 years ago

M:D3Y was also found in this recent AY.45 from South Africa. image

c19850727 commented 2 years ago

don't know if someone here already noticed it but that NY sequence has an additional S:Y144del.

FedeGueli commented 2 years ago

really poor coverage but i think EPI_ISL_14175734 is part of this sublineage too.

Edited it seems Usher puts it correctly in the tree of this sublineage: Schermata 2022-08-09 alle 08 08 21 https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_822c_1f9150.json?c=pango_lineage_usher&label=nuc%20mutations:C15738T

Edit see now already spotted by @corneliusroemer .

corneliusroemer commented 2 years ago

I'm designating this now but I've just stumbled over the question of whether this should be BA.2.10 sublineage or BA.2.10.1 sublineage.

On Nextclade - and also in the designated sequences, BA.2.10.1 differs from BA.2.10 just by a single mutation S:G798D that this lineage has.

So in that sense BA.2.10.1 would make sense. Giving it an alias would make pronouncing the lineage easier.

The only situation in which calling this a BA.2.10.1 lineage would be problematic would be if ancestors of this pop up that lack S:G798D - but in the past we've rarely seen ancestors of such 2nd gen lineages appear - so the question is mostly theoretic I think.

BA.2.10 reached 50% at its peak in India, and only spread to other countries much later. So it's not surprising that we see a BA.2.10 sublineage now in India.

On Usher, the samples attach to a Japanese BA.2.10 (?) that has C15738T but also S:798D - there are also 2 additional such samples from the US that are missing on Usher - or attach elsewhere.

image

https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_15178_26aa40.json?c=pango_lineage_usher&label=nuc%20mutations:C15738T

image

So overall it seems quite unclear which came first: C15738T or S:798D.

But I'll go with BA.2.10.4 in this PR - @chrisruis @InfrPopGen any thoughts? This is easy to change before we merge.

FedeGueli commented 2 years ago

@corneliusroemer Cornelius Nextclade takes it as BA.2.10.1 also the other one from Japan: Schermata 2022-08-09 alle 16 47 27

https://cov-spectrum.org/explore/World/AllSamples/Past6M/variants?aaMutations=S%3A798D&nucMutations=C15738T&aaMutations1=S%3A153I%2CS%3A1258Q%2CN%3A151L&

c19850727 commented 2 years ago

@corneliusroemer My 2 cents: for the sake of easier sci-comms, designating it as a BA.2.10.1 sublineage (and thus using a new prefix) might be a better idea, considering how BA.2.12.1 used to be mouthful......

FedeGueli commented 2 years ago

Yeah agree with @c19850727 but to me it is really a BA.2.10.1 not a BA.2.10

FedeGueli commented 2 years ago

Btw the Japanese one basal to this lineage is travel related (airport quarantine) from Thailand.

Sinickle commented 2 years ago

C15738T is not a defining mutation of BA.2.10. Although it is more common in BA.2.10 than BA.2.10.1, I'm seeing the question as... "Which of C15738T and S:G798D was inherited, and which was acquired?" in which case, the most likely answer is that the more common circulating mutation was inherited, and the more rare one was acquired.

S:G798D is much more common than C15738T. Most likely the parent is BA.2.10.1, not BA.2.10.

silcn commented 2 years ago

Despite what I said in the first post, I'm actually more inclined towards BA.2.10.1 rather than BA.2.10, though not for the reasons that anyone has stated so far. I don't buy the argument "C15738T is rare so it was probably acquired" - if it was truly rare, then the fact BA.2.10+C15738T already existed would surely be a strong hint that this was descended from that.

The thing is that C15738T isn't rare! It's highly homoplasic and has appeared in loads of Omicron lineages. S:G798D is actually less common in Omicron as a whole, especially when counted by number of emergences rather than number of sequences. So it's more likely that this inherited S:G798D and independently gained C15738T.

btw @FedeGueli Nextclade only places sequences relative to a tree of designated lineages, not the full tree like Usher, so it will place any BA.2.10+S:G798D within BA.2.10.1, since BA.2.10.1 is just defined by BA.2.10+S:G798D. So that isn't much of a guide here.

Sinickle commented 2 years ago

I guess it's the product of the inherent likelihood to acquire a mutation, and the number of opportunities to acquire it.

I'm seeing that there's around 100x as many BA.2.10* with S:G798D compared to C15738T. If S:G798D is less than 100x as inherently likely to be acquired as C15738T then the math/probability favors S:G798D being inherited.

silcn commented 2 years ago

I'm seeing that there's around 100x as many BA.2.10* with S:G798D compared to C15738T. If C15738T is less than 100x as inherently likely to be acquired as S:G798D then the math/probability favors S:G798D being inherited.

Think that's the wrong way round. If S:G798D is <100x as likely to be acquired as C15738T then the probability favours S:G798D being inherited.

Sinickle commented 2 years ago

I'm seeing that there's around 100x as many BA.2.10* with S:G798D compared to C15738T. If C15738T is less than 100x as inherently likely to be acquired as S:G798D then the math/probability favors S:G798D being inherited.

Think that's the wrong way round. If S:G798D is <100x as likely to be acquired as C15738T then the probability favours S:G798D being inherited.

Oof you're right, I got the order reversed here. Thanks for the correction!

corneliusroemer commented 2 years ago

@silcn @thomasppeacock & co - I'm just making a new Nextclade reference tree for all the new lineages and I stumbled upon this: Both BJ.1 (this issue) and BA.2.10.4 (#898) share not only S:446S (reasonable convergence) but also ORF1b:G662S (15451A) which is rather strange as this mutation is not common in BA.2.10* at all.

Just 72 sequences, and about half of them are now part of either BJ.1 and BA.2.10.4. https://cov-spectrum.org/explore/World/AllSamples/Past6M/variants?variantQuery=15451A+%26+nextcladePangoLineage%3ABA.2.10*&

Could they after all derive from a common ancestor - as in, could they really be BA.2.10.5.1 and BA.2.10.5.2?

You can see the current version of my relative-to-BA.2-Nextclade-ref-tree here: https://nextstrain.org/staging/nextclade/sars-cov-2/21L?branchLabel=aa&c=gt-nuc_15451

It would be great if someone could investigate - maybe @AngieHinrichs can help too? After all this is quite intriguing!

silcn commented 2 years ago

@corneliusroemer as @shay671 pointed out in #898, ORF1b:G662S may well have been acquired before BA.2.10.4's big mutational leap: there is 30-sequence branch, including a handful of sequences from India, defined by BA.2.10+ORF1b:G662S, before the jump to BA.2.10.4. This would explain one part of the convergence. But it seems very unlikely that two separate highly divergent lineages would both develop from such a small branch (unless they were in the same person, but then I'd expect more common mutations between them).

So I still tend towards BJ.1 being a BA.2.10.1 sublineage that independently picked up ORF1b:G662S. After all, BA.2.75 has it too, so convergence would be plausible. And if it really is commonly selected for in 2nd-generation BA.2s, then BA.2.10.4 might have acquired it independently too rather than being derived from that BA.2.10+ORF1b:G662S branch as Usher suggests.

thomasppeacock commented 2 years ago

I agree with @silcn and @shay671, very hard to say if ancestral or convergent and clearly NSP12 - G671S is showing/has previously shown multiple independent emergences in other variants, its clearly favoured in BA.2 sublineages as well.

cvejris commented 2 years ago

Just realized another fun fact: V213E is a double substitution in the 2nd nucleotide of the codon (V213G is default in BA.2)

shay671 commented 2 years ago

The NSP12 mutation is converged with B.1.6217.2, with the XB (former known as B.1.628, which had success in mid-2021 in Mexico), and with BA.2.75. From my perspective, this convergence shows a strong selection pressure as strong (if not more) as the known RBD mutations. Now think what we would have thought if this was not this NSP12 mutation but a mutation in 452 or 484 etc. Suppose it was this mutation + some synonymous or nonconverging ORF1ab mutations, Ok. But if it's this mutation only, from my POV, it's much more reasonable that it evolved in parallel.

But - I also agree with Saka that if this will be a variant of impact, it better have its unique letter combination.

silcn commented 2 years ago

Sequence from Singapore: EPI_ISL_14459008. Local case, not a returning traveller.

silcn commented 2 years ago

Two sequences from Karnataka, the first non-West Bengal Indian sequences: EPI_ISL_14586822, EPI_ISL_14586975. Still no upload from West Bengal in the last two weeks.

FedeGueli commented 2 years ago

New sequence EPI_ISL_14625448 from West Bengala uploaded today, collected 1st of August

FedeGueli commented 2 years ago

2 From Karnataka and 22 samples from West Bengala uploaded today (Gisaid query for them : M_D3Y, Spike_V83A)

EPI_ISL_14724097 EPI_ISL_14724124 EPI_ISL_14733713 EPI_ISL_14733718 EPI_ISL_14733726 EPI_ISL_14733728 EPI_ISL_14733812 EPI_ISL_14733814 EPI_ISL_14733822 EPI_ISL_14733827 EPI_ISL_14733828 EPI_ISL_14733833 EPI_ISL_14733834 EPI_ISL_14733978 EPI_ISL_14734011 EPI_ISL_14734111 EPI_ISL_14734112 EPI_ISL_14734114 EPI_ISL_14734133 EPI_ISL_14734138 EPI_ISL_14734141 EPI_ISL_14734143 EPI_ISL_14734149 EPI_ISL_14734153

agamedilab commented 2 years ago

BJ.1 expanded into Austria with todays upload (first European sequences).

grafik

https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_1de2c_f496e0.json?branchLabel=aa%20mutations&c=pango_lineage_usher&label=nuc%20mutations:A405G,T17859C,C22109G,G22200A,G22577C,G22599C,G24570T,T27722C

EPI_ISL_14750046 EPI_ISL_14750054

corneliusroemer commented 2 years ago

7 sequences have now picked up S2 mutation S:S1170Y - could be candidate for sublineage if it keeps expanding

image
FedeGueli commented 2 years ago

One more sequence intercepted by South Korean surveillance: EPI_ISL_14749204 from India collected back 12/08/22

FedeGueli commented 2 years ago

two more sequences SIngapore via India and Karnataka: 14810786 14810203

FedeGueli commented 2 years ago

Two more sequences from Austria. collected on 29 and 30 august

UlrichElling commented 2 years ago

We just found another 2 cases including the 144 deletion in the random surveillance of Austria.

UlrichElling commented 2 years ago

will be uploaded asap

corneliusroemer commented 2 years ago

Thanks @UlrichElling - that's very helpful and interesting.

UlrichElling commented 2 years ago

image The mutations are really a collection from hell!

FedeGueli commented 2 years ago

big upload from India 21 sequence and one from Singapore, this last one worryingly with travel history from Bangladesh. @ryhisner pointed to Bangladesh quite early as possible area of origin or spread it think.

UlrichElling commented 2 years ago

Thank you! We are trying to identify an active case and take it into culture.

FedeGueli commented 2 years ago

One more sequences from Netherlands collected on 31th august,

EPI_ISL_14913530

silcn commented 2 years ago

One more sequence intercepted by South Korean surveillance: EPI_ISL_14749204 from India collected back 12/08/22

Just to note that this one has S:F486S, which is something to keep an eye on. When I first spotted this lineage, I noticed the Bloom lab calculator suggested it ought to particularly susceptible to mutations at S:486, as they have the potential to knock out a substantial proportion of remaining antibodies.

silcn commented 2 years ago

Belatedly corrected the title - I had missed 144del originally, because my screen resolution isn't high enough for me to spot it between G142D and H146Q on Nextclade, and I didn't think to check deletions separately - and of course it doesn't show on Usher.

FedeGueli commented 2 years ago

@silcn spotted it yesterday too i didnt know it was in every seqs! thanks! 9 samples uploaded today ( i didnt check if they overlapped with the Cornelius proposed recombinant) just to say it s seems circulating and growing.

FedeGueli commented 2 years ago

One new sequence from England

FedeGueli commented 2 years ago

2 from Bangladesh spotted by @josetteshoenma and 1 from Massachussets Total is 82

UlrichElling commented 2 years ago

image Three from Bangladesh even, no? I guess we must await the next upload from West Bengal to know more, rest is a bit stochastic for now and no robustly detected community transmission yet. As BA.2.75 is on the decline in India comparing BJ.1 to BA.75.2 will likely only be predictive elsewhere.