Closed silcn closed 2 years ago
Now found in the USA (New York): EPI_ISL_14305185
This looks really bad... In addition to the most striking mutations that @silcn mentioned (ORF1b:G662S, S:V445P), I would like to point out two more: S:G339H thus far only present in BA.2.75; is a double-substitution (the default in Omicrons is G339D). M:D3Y - this position is strongly parallel in Omicrons - BA.1 has D3G, BA.5 has D3N
agree @cvejris M:3Y is very "intriguing" as we know that position to can give some effect but definetely we dont know anything of that.
Going to recommend this gets assigned along with #898 due to the mutational profiles, widespread geographical spread, apparent rapid expansansion (at least as far as can be concluded from a dozen sequences) and second generation nature of these.
I agree that this is worth of immediate designation based on the Spike profile and given that it has already been sequenced in New York far away from West Bengal.
To analyse these sequences, I suggest to use the dataset "SARS-CoV-2 relative to BA.2" on master.clades.nextstrain.org (this is an experimental dataset, hence not available on main Nextclade). But given that only mutations relative to BA.2 are shown - it's much easier to spot what's going on in the RBD in the mutation view (unfortunately deletions present in BA.2 do show in this view as well since otherwise the reference sequence would be in different coordinates).
Here you can see the new mutations with respect to BA.2 - there are a lot.
Ignore the S:27 mutation - this is because of the BA.2 deletion, the "redness" = apparent bad quality is because there are so many private mutations. The sequences are clean - yes there may be a few reversion but overall this is not an artefact. Especially given the NY sequence being such a good fit for the WB ones.
This is for the NY sequence:
I suggest to use the following covSpectrum query that is a bit more sensitive for partial sequences (it catches all the 3 that are already on covSpectrum, 7 of the 10 sequences were uploaded to GISAID just today so it will take up to 3 more days for them to show up on covSpectrum): https://cov-spectrum.org/explore/World/AllSamples/Past6M/variants?variantQuery=%5B4-of%3A+ORF1a%3A47R%2C+S%3A83A%2C+S%3A146Q%2C+S%3A213E%2C+S%3A339H%2C+S%3A445P%2C+S%3A483A%2C+S%3A1003I%2C+M%3A3Y%2C+ORF7a%3A110T%2C+N%3A282I%2C+15738T%2C+15939C%5D&
In order to query on GISAID, I suggest to use: M_D3Y,N_T282I
This gives me all the 10 sequences that I got through a broader search using just M_D3Y
then filtering using Nextclade clade assignments.
@corneliusroemer oh excellent spot of EPI_ISL_14175734 with your cov-spectrum query. I'd completely missed it because almost all the mutations are covered by NNNs, but it looks like it belongs. So there are 11 sequences including that one. "M_D3Y, Spike_V83A" is currently the only GISAID query that finds them all, but "M_D3Y, N_T282I" will probably be better in the long term.
Oh I must have missed one - not sure how since I used just M_D3Y
at some point.
I guess the very best method to be up to date and not days behind due to GISAID -> covSpectrum/Usher delay is to query GISAID for recent M_D3Y
(plus maybe try another mutation in case that one is dropout), then run through Nextclade and pick the ones that are in BA.2.10 [this is reminiscent of filtration in approximate pattern matching]
Worth noting that N:T282I is also found in BG.2. I had a bad feeling about this one when the first two were uploaded on the same day from separate patients last week. The next few months are going to be very, very interesting.
M:D3Y was also found in this recent AY.45 from South Africa.
don't know if someone here already noticed it but that NY sequence has an additional S:Y144del.
really poor coverage but i think EPI_ISL_14175734 is part of this sublineage too.
Edited it seems Usher puts it correctly in the tree of this sublineage: https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_822c_1f9150.json?c=pango_lineage_usher&label=nuc%20mutations:C15738T
Edit see now already spotted by @corneliusroemer .
I'm designating this now but I've just stumbled over the question of whether this should be BA.2.10 sublineage or BA.2.10.1 sublineage.
On Nextclade - and also in the designated sequences, BA.2.10.1 differs from BA.2.10 just by a single mutation S:G798D that this lineage has.
So in that sense BA.2.10.1 would make sense. Giving it an alias would make pronouncing the lineage easier.
The only situation in which calling this a BA.2.10.1 lineage would be problematic would be if ancestors of this pop up that lack S:G798D - but in the past we've rarely seen ancestors of such 2nd gen lineages appear - so the question is mostly theoretic I think.
BA.2.10 reached 50% at its peak in India, and only spread to other countries much later. So it's not surprising that we see a BA.2.10 sublineage now in India.
On Usher, the samples attach to a Japanese BA.2.10 (?) that has C15738T but also S:798D - there are also 2 additional such samples from the US that are missing on Usher - or attach elsewhere.
So overall it seems quite unclear which came first: C15738T or S:798D.
But I'll go with BA.2.10.4 in this PR - @chrisruis @InfrPopGen any thoughts? This is easy to change before we merge.
@corneliusroemer Cornelius Nextclade takes it as BA.2.10.1 also the other one from Japan:
@corneliusroemer My 2 cents: for the sake of easier sci-comms, designating it as a BA.2.10.1 sublineage (and thus using a new prefix) might be a better idea, considering how BA.2.12.1 used to be mouthful......
Yeah agree with @c19850727 but to me it is really a BA.2.10.1 not a BA.2.10
Btw the Japanese one basal to this lineage is travel related (airport quarantine) from Thailand.
C15738T is not a defining mutation of BA.2.10. Although it is more common in BA.2.10 than BA.2.10.1, I'm seeing the question as... "Which of C15738T and S:G798D was inherited, and which was acquired?" in which case, the most likely answer is that the more common circulating mutation was inherited, and the more rare one was acquired.
S:G798D is much more common than C15738T. Most likely the parent is BA.2.10.1, not BA.2.10.
Despite what I said in the first post, I'm actually more inclined towards BA.2.10.1 rather than BA.2.10, though not for the reasons that anyone has stated so far. I don't buy the argument "C15738T is rare so it was probably acquired" - if it was truly rare, then the fact BA.2.10+C15738T already existed would surely be a strong hint that this was descended from that.
The thing is that C15738T isn't rare! It's highly homoplasic and has appeared in loads of Omicron lineages. S:G798D is actually less common in Omicron as a whole, especially when counted by number of emergences rather than number of sequences. So it's more likely that this inherited S:G798D and independently gained C15738T.
btw @FedeGueli Nextclade only places sequences relative to a tree of designated lineages, not the full tree like Usher, so it will place any BA.2.10+S:G798D within BA.2.10.1, since BA.2.10.1 is just defined by BA.2.10+S:G798D. So that isn't much of a guide here.
I guess it's the product of the inherent likelihood to acquire a mutation, and the number of opportunities to acquire it.
I'm seeing that there's around 100x as many BA.2.10* with S:G798D compared to C15738T. If S:G798D is less than 100x as inherently likely to be acquired as C15738T then the math/probability favors S:G798D being inherited.
I'm seeing that there's around 100x as many BA.2.10* with S:G798D compared to C15738T. If C15738T is less than 100x as inherently likely to be acquired as S:G798D then the math/probability favors S:G798D being inherited.
Think that's the wrong way round. If S:G798D is <100x as likely to be acquired as C15738T then the probability favours S:G798D being inherited.
I'm seeing that there's around 100x as many BA.2.10* with S:G798D compared to C15738T. If C15738T is less than 100x as inherently likely to be acquired as S:G798D then the math/probability favors S:G798D being inherited.
Think that's the wrong way round. If S:G798D is <100x as likely to be acquired as C15738T then the probability favours S:G798D being inherited.
Oof you're right, I got the order reversed here. Thanks for the correction!
@silcn @thomasppeacock & co - I'm just making a new Nextclade reference tree for all the new lineages and I stumbled upon this: Both BJ.1 (this issue) and BA.2.10.4 (#898) share not only S:446S (reasonable convergence) but also ORF1b:G662S (15451A) which is rather strange as this mutation is not common in BA.2.10* at all.
Just 72 sequences, and about half of them are now part of either BJ.1 and BA.2.10.4. https://cov-spectrum.org/explore/World/AllSamples/Past6M/variants?variantQuery=15451A+%26+nextcladePangoLineage%3ABA.2.10*&
Could they after all derive from a common ancestor - as in, could they really be BA.2.10.5.1 and BA.2.10.5.2?
You can see the current version of my relative-to-BA.2-Nextclade-ref-tree here: https://nextstrain.org/staging/nextclade/sars-cov-2/21L?branchLabel=aa&c=gt-nuc_15451
It would be great if someone could investigate - maybe @AngieHinrichs can help too? After all this is quite intriguing!
@corneliusroemer as @shay671 pointed out in #898, ORF1b:G662S may well have been acquired before BA.2.10.4's big mutational leap: there is 30-sequence branch, including a handful of sequences from India, defined by BA.2.10+ORF1b:G662S, before the jump to BA.2.10.4. This would explain one part of the convergence. But it seems very unlikely that two separate highly divergent lineages would both develop from such a small branch (unless they were in the same person, but then I'd expect more common mutations between them).
So I still tend towards BJ.1 being a BA.2.10.1 sublineage that independently picked up ORF1b:G662S. After all, BA.2.75 has it too, so convergence would be plausible. And if it really is commonly selected for in 2nd-generation BA.2s, then BA.2.10.4 might have acquired it independently too rather than being derived from that BA.2.10+ORF1b:G662S branch as Usher suggests.
I agree with @silcn and @shay671, very hard to say if ancestral or convergent and clearly NSP12 - G671S is showing/has previously shown multiple independent emergences in other variants, its clearly favoured in BA.2 sublineages as well.
Just realized another fun fact: V213E is a double substitution in the 2nd nucleotide of the codon (V213G is default in BA.2)
The NSP12 mutation is converged with B.1.6217.2, with the XB (former known as B.1.628, which had success in mid-2021 in Mexico), and with BA.2.75. From my perspective, this convergence shows a strong selection pressure as strong (if not more) as the known RBD mutations. Now think what we would have thought if this was not this NSP12 mutation but a mutation in 452 or 484 etc. Suppose it was this mutation + some synonymous or nonconverging ORF1ab mutations, Ok. But if it's this mutation only, from my POV, it's much more reasonable that it evolved in parallel.
But - I also agree with Saka that if this will be a variant of impact, it better have its unique letter combination.
Sequence from Singapore: EPI_ISL_14459008. Local case, not a returning traveller.
Two sequences from Karnataka, the first non-West Bengal Indian sequences: EPI_ISL_14586822, EPI_ISL_14586975. Still no upload from West Bengal in the last two weeks.
New sequence EPI_ISL_14625448 from West Bengala uploaded today, collected 1st of August
2 From Karnataka and 22 samples from West Bengala uploaded today (Gisaid query for them : M_D3Y, Spike_V83A)
EPI_ISL_14724097 EPI_ISL_14724124 EPI_ISL_14733713 EPI_ISL_14733718 EPI_ISL_14733726 EPI_ISL_14733728 EPI_ISL_14733812 EPI_ISL_14733814 EPI_ISL_14733822 EPI_ISL_14733827 EPI_ISL_14733828 EPI_ISL_14733833 EPI_ISL_14733834 EPI_ISL_14733978 EPI_ISL_14734011 EPI_ISL_14734111 EPI_ISL_14734112 EPI_ISL_14734114 EPI_ISL_14734133 EPI_ISL_14734138 EPI_ISL_14734141 EPI_ISL_14734143 EPI_ISL_14734149 EPI_ISL_14734153
BJ.1 expanded into Austria with todays upload (first European sequences).
EPI_ISL_14750046 EPI_ISL_14750054
7 sequences have now picked up S2 mutation S:S1170Y - could be candidate for sublineage if it keeps expanding
One more sequence intercepted by South Korean surveillance: EPI_ISL_14749204 from India collected back 12/08/22
two more sequences SIngapore via India and Karnataka: 14810786 14810203
Two more sequences from Austria. collected on 29 and 30 august
We just found another 2 cases including the 144 deletion in the random surveillance of Austria.
will be uploaded asap
Thanks @UlrichElling - that's very helpful and interesting.
The mutations are really a collection from hell!
big upload from India 21 sequence and one from Singapore, this last one worryingly with travel history from Bangladesh. @ryhisner pointed to Bangladesh quite early as possible area of origin or spread it think.
Thank you! We are trying to identify an active case and take it into culture.
One more sequences from Netherlands collected on 31th august,
EPI_ISL_14913530
One more sequence intercepted by South Korean surveillance: EPI_ISL_14749204 from India collected back 12/08/22
Just to note that this one has S:F486S, which is something to keep an eye on. When I first spotted this lineage, I noticed the Bloom lab calculator suggested it ought to particularly susceptible to mutations at S:486, as they have the potential to knock out a substantial proportion of remaining antibodies.
Belatedly corrected the title - I had missed 144del originally, because my screen resolution isn't high enough for me to spot it between G142D and H146Q on Nextclade, and I didn't think to check deletions separately - and of course it doesn't show on Usher.
@silcn spotted it yesterday too i didnt know it was in every seqs! thanks! 9 samples uploaded today ( i didnt check if they overlapped with the Cornelius proposed recombinant) just to say it s seems circulating and growing.
One new sequence from England
2 from Bangladesh spotted by @josetteshoenma and 1 from Massachussets Total is 82
Three from Bangladesh even, no? I guess we must await the next upload from West Bengal to know more, rest is a bit stochastic for now and no robustly detected community transmission yet. As BA.2.75 is on the decline in India comparing BJ.1 to BA.75.2 will likely only be predictive elsewhere.
Proposal for a sublineage of BA.2.10 Earliest sequence: 2022-07-02 Countries detected: India (9 seq, all from West Bengal)
Mutations in addition to BA.2.10: S:V83A, H146Q, Q183E, V213E, G339H, R346T, L368I, V445P, G446S, V483A, F490V, G798D, S1003I ORF1a:K47R ORF1b:G662S M:D3Y ORF7a:I110T N:T282I nuc:C15738T, T15939C, T17859C, G28079T
S:V445P is a 2-nucleotide mutation (GTT -> CCT).
S:G798D is shared with BA.2.10.1, but Usher places this on a branch of BA.2.10 with C15738T. I think the order may actually be ambiguous, but the fact that the BA.2.10+C15738T sequences are mostly from India makes me think that Usher has probably got it right.
This lineage has so far only been detected in the state of West Bengal, India, but I think it deserves a proposal based on its exceptional total of 13 spike mutations, including 7 in the RBD. 346, 445, 446 and 490 are significant escape locations in the Bloom lab calculator, in particular knocking out a large fraction of the antibodies elicited by pre-Omicron virus that still neutralise BA.2. Somewhat surprisingly, S:Q493R is not reverted!
Also note that this is yet another highly divergent BA.2 with ORF1b:G662S, after BA.2.75 and the BA.2.10 sublineage in #898 (see the discussion in that issue).
Usher tree has some reversions in the RBD which are most likely artefacts. The two sequences before the S:L368I branch have NNNs there so I assume they also have S:L368I.
https://nextstrain.org/fetch/github.com/silcn/subtreeAuspice1/raw/main/auspice/subtreeAuspice1_genome_2019f_10ae20.json?branchLabel=Spike%20mutations&c=gt-S_1003&label=nuc%20mutations:C15738T
Genomes: EPI_ISL_14166909 EPI_ISL_14167044 EPI_ISL_14302968 EPI_ISL_14303108 EPI_ISL_14303176 EPI_ISL_14303181 EPI_ISL_14303196 EPI_ISL_14303277 EPI_ISL_14303283
Improved cov-spectrum query from @corneliusroemer: https://cov-spectrum.org/explore/World/AllSamples/Past6M/variants?variantQuery=%5B4-of%3A+ORF1a%3A47R%2C+S%3A83A%2C+S%3A146Q%2C+S%3A213E%2C+S%3A339H%2C+S%3A445P%2C+S%3A483A%2C+S%3A1003I%2C+M%3A3Y%2C+ORF7a%3A110T%2C+N%3A282I%2C+15738T%2C+15939C%5D&