Subclade of B.1.1.1 (resulting from a discrete evolutionary sprint in December/January?) widespread in Chile

mkedwards commented 3 years ago

New lineage proposal

by Michael K. Edwards m.k.edwards@gmail.com

Description

Sub-lineage of: B.1.1.1 Earliest sequence: collected 2021-01-17 Most recent sequence: widespread in Chile (as of early April, latest nextstrain.org update) Countries circulating: widespread in Chile; formerly common in Peru, current status uncertain; spotted in Spain, Australia, Germany, USA/NY, Brazil

It actually appears to have originated in Peru, and split into two subclades. One has deletions in the S NTD (Δ246–252 plus N253D) and in nsp6 (ORF1a Δ3675–3677, shared with B.1.1.7/B.1.351/P.1 — which doesn't mean shared ancestry); this one traveled to Chile and took off. The one without the deletions doesn't appear to have spread beyond Peru, except maybe to Brazil; Nextstrain's South American subsample is a bit of a moving target, and the Brazilian sequence in the screenshot isn't there right now.

The closest thing to the founding strain in Nextstrain's South American subsample appears to be GISAID EPI ISL 1111128. Most of the cases outside Peru lack the N: T366I mutation, so let's subtract that. Here are the visible changes specific to the variant, then:

N: P13L, G214C ORF1a: P2287S, F2387V, L3201P, T3255I ORF1b: P314L ORF9b: P10S S: G75V, T76I, L452Q, F490S, T859N

(I haven't yet dug down to the base-sequence level to look at what may be beneath the waterline, in terms of non-coding regions and template-switching effects associated with RNA secondary structure.)

Genomes

I don't know how to get nextstrain.org to dump a CSV. S: 452Q,F490S is currently a pretty good filter https://nextstrain.org/ncov/south-america?branchLabel=aa&c=country&gt=S.452Q,490S — but RBD point-substitution tweaks are pretty short putts, so some of the probably-fitness-neutral changes like S:T859 and the ORF1a hits outside nsp6 are probably better markers. If I had to guess I'd say that N:P13L (which is also ORF9b:P10S) is functionally interesting, but it has been invented repeatedly in other clades so it's not a good lineage marker.

Evidence

Proposed lineage name

I have no idea what the conventions are around this. Presumably B.1.1.something?

mkedwards commented 3 years ago

For what it's worth, F490S is documented antibody-escape: https://www.biorxiv.org/content/10.1101/2020.07.21.214759v1.full.pdf

AngieHinrichs commented 3 years ago

That is an interesting branch of B.1.1.1. A caveat about Nextstrain, though: its wonderful interactive display limits it to about 5k sequences per build. At this point, GISAID has over a million sequences and is growing by tens of thousands of sequences every day. So Nextstrain can only show you a downsampled view of currently available sequences. Even though they do a good job of downsampling, there are unavoidable side effects: the structure of the tree can change significantly from one day to the next, and you can't get a complete view of all of the sequences that may belong to a particular branch.

If you simply want a list of the sequences in your current Nextstrain view, click the "Download Data" link near the bottom of the page and click "Metadata (TSV)".

If you have access to GISAID's EpiCov database, then in the Search tab's Substitutions input you can paste in "Spike_L452Q,Spike_F490S,Spike_T859N" and get a list of sequences with those mutations. At the moment there are 263. If you download FASTA for them, you can do a quick tally of counts per country like this:

grep ^\> gisaid_....fa | sort | cut -d / -f 2 | uniq -c | sort -nr
 108 Chile
  93 USA
  35 Peru
  19 Germany
   3 England
   2 Spain
   1 Brazil
   1 Australia
   1 Argentina

And of course you can align them to the SARS-CoV-2 reference genome, build your own tree of downloaded sequences, etc. etc. GISAID also provides a way to contact submitters of sequences, so you could contact the folks in Chile and Peru and see if they are already studying this sub-lineage.

[I am not a member of the cov-lineages team; I just weigh in here with my opinions because github.]

ptsukayama commented 3 years ago

Hello from Peru.

We have been following this sublineage since March. Just uploaded 26 new genomes from Lima (capital city) collected between Jan-31 and March-18, and 16 (61.5%) belong to this sublineage of B.1.1.1.

Using a multiplexed RT-qPCR protocol to identify VOC-associated deletions, Peru's National Institute of Health screened 579 samples from 10/25 regions in Mar-2021 and found 195 (33.7%) that had a result consistent with P.1 or this B.1.1.1 sublineage.

INS qPCR VOC

So, we believe this clade is widespread and expanding in Peru since Dec-2020 / Jan-2021. Also circulating in several parts of Chile.

New sequences sharing S:L452Q,F490S,T859N and S:Δ246–252 from Ecuador, Chile, Germany, Spain, and USA have been uploaded since last week, totaling 356 genomes on GISAID. Outside South America, onward transmission seems to occur in Germany and USA.

We would like to propose its designation as C.37

Peru Nexstrain build here: https://nextstrain.org/community/quipupe/Nextstrain_Peru

Nextstrain Peru

amniewiadomska commented 3 years ago

I'd like to also draw attention to this lineage for some other reasons. I first noticed it when looking at circulating genomes in Chile, which is currently experiencing a huge surge in infections despite high numbers of vaccinated people. Some concerning features of this lineage:

1) Prevalence rates of this lineage have increased dramatically since January in Peru, Chile and the USA (as mentioned above). 2) It contains a relatively large deletion in S not seen before (del 246-252). However B.1.351 has a deletion in a similar region (del 241-243). While this region is particularly susceptible to deletions, it’s worrying when it starts occurring in multiple lineages, especially those already in VoC/LoC. 3) It contains 2 established Ab escape mutations and occurs multiple Ab epitopes: L452Q and F490S 452 = Serum Ab escape: Yes; mAb escape:LY-CoV555; Ab epitope:IEDB.1075135, Ab epitope:IEDB.1310989, Ab epitope:IEDB.1311244, Ab epitope:IEDB.1314087, Ab epitope:IEDB.1314090, Ab epitope:IEDB.1329039 490 = Serum Ab escape: Yes, mAb escape:LY-CoV555, Ab epitope:IEDB.1075135, Ab epitope:IEDB.1075136, Ab epitope:IEDB.1087140, Ab epitope:IEDB.1181325, Ab epitope:IEDB.1310989, Ab epitope:IEDB.1311243, Ab epitope:IEDB.1311244, Ab epitope:IEDB.1314085, Ab epitope:IEDB.1314087, Ab epitope:IEDB.1314090, Ab epitope:IEDB.1329039 4) T859N also occurs in B.1.526, another LoC. 5) A subset of the sequences also contain a deletion from S:63-75. This region is also susceptible to deletions, and a 69-70 deletion also occurs in B.1.1.7.

trvrb commented 3 years ago

I also believe this lineage should get a designation. A couple other Nextstrain views to surface:

Here highlighting 76I viruses flags this clade. If we look at logistic growth across lineages we see that this lineage ranks highest in South America currently. As Angie says above, this is subsampled, but the subsampling is aimed to be equitable across space and time and should help smooth out geographic sampling bias.

https://nextstrain.org/ncov/south-america?c=gt-S_76&f_region=South%20America&l=scatter&scatterY=logistic_growth

logistic-growth

Looking at S1 mutations gives a similar readout where this lineage has a relative abundance of S1 mutations.

https://nextstrain.org/ncov/south-america?branchLabel=none&c=gt-S_76&f_region=South%20America&l=scatter&scatterY=S1_mutations

chrisruis commented 3 years ago

Hi All, thanks for submitting this and your comments. We've designated this as lineage C.37. As @ptsukayama says, it looks like there is onward transmission in multiple countries, so I'll keep an eye on this to see if there are clear sublineages as more data comes in. For now, all of these sequences are designated C.37 in v1.1.20

cov-lineages / pango-designation