cov-lineages / pango-designation

Repository for suggesting new lineages that should be added to the current scheme
Other
1.04k stars 97 forks source link

Potential BA.2.3 Sublineage with ORF1a:K798N, ORF1b:A88V, and E:S55F (USA, Canada; 409 seqs as of 3 Jun 2022) #677

Closed alurqu closed 2 years ago

alurqu commented 2 years ago

There may be a BA.2.3.2 sublineage with ORF1a:K798N (nucleotide G2659T) and ORF1b:A88V (aka ORF1ab:A4489V, nucleotide 13730T) observed in the Northern Mariana Islands, Hawaii, the mainland USA, and Canada with one sequence each in Mexico and Japan. All 93 GenBank sequences (acquired 26 May 2022 via Nextstrain) are from USA states or territories. As of 27 May 2022, Cov-Spectrum reports 329 BA.2.3.2+ORF1a:798N+ORF1b:88V sequences with 300 from the USA and 27 from Canada.

Cov-Spectrum calculates growth advantages of 54% compared to BA.2.3.2, 44% compared to BA.2, and 51% compared to B.1.1.529*.

This is similar to and possibly related to the primary undesignated lineage from closed Issue 587 (https://github.com/cov-lineages/pango-designation/issues/587), but this proposed lineage includes BA.2.3.2's E:S55F mutation that was mostly absent in Issue 587. However, this proposed lineage may well represent growth of the group of 34 BA.2.3.2 sequences that was mentioned in my 27 Apr 2022 Issue 587 comment.

First GenBank sequence: Northern Mariana Islands 7 March 2022

Most recent GenBank sequence: California USA 14 May 2022

A zip archive of GenBank-formatted and derived metadata and FASTA files for these sequences is available at BA.2.3.2+ORF1a_798N+ORF1b_88V.zip All of these sequences and metadata are from GenBank public sequences via Nextstrain.

Outbreak.info's Situation Report for this potential lineage is at https://outbreak.info/situation-reports?pango=BA.2.3.2&muts=ORF1a%3AK798N&muts=ORF1b%3AA88V

Counts from Cov-Spectrum as of 27 May 2022: BA2 3 2+ORF1a_798N+ORF1b_88V_counts Source https://cov-spectrum.org/explore/World/AllSamples/AllTimes/variants?aaMutations=ORF1b%3A88V%2CORF1a%3A798N&pangoLineage=BA.2.3.2*&

Cov-Spectrum Growth Advantage over BA.2.3.2 (54% as of May 27, 2022): BA2 3 2+ORF1a_798N+ORF1b_88V_advantage_over_BA 2 3 2 Source https://cov-spectrum.org/explore/World/AllSamples/AllTimes/variants?pangoLineage=BA.2.3.2&aaMutations1=ORF1b%3A88V%2CORF1a%3A798N&pangoLineage1=BA.2.3.2*&analysisMode=CompareToBaseline&

Cov-Spectrum Growth Advantage over BA.2 (44% as of May 27, 2022): BA2 3 2+ORF1a_798N+ORF1b_88V_advantage_over_BA 2 Source https://cov-spectrum.org/explore/World/AllSamples/AllTimes/variants?pangoLineage=BA.2&aaMutations1=ORF1b%3A88V%2CORF1a%3A798N&pangoLineage1=BA.2.3.2*&analysisMode=CompareToBaseline&

Advantage over B.1.1.529 (51% as of May 27, 2022): BA2 3 2+ORF1a_798N+ORF1b_88V_advantage_over_B 1 1 529 Source https://cov-spectrum.org/explore/World/AllSamples/AllTimes/variants?pangoLineage=B.1.1.529&aaMutations1=ORF1b%3A88V%2CORF1a%3A798N&pangoLineage1=BA.2.3.2*&analysisMode=CompareToBaseline&

silcn commented 2 years ago

I don't think this is actually BA.2.3.2. I think it's BA.2.3 - in fact, a sublineage of your previous proposal #587 - that has independently picked up E:S55F, and that's leading to it being misclassified by pangoLEARN as BA.2.3.2. Usher shows the order as ORF1a:K798N (blue) then ORF1b:A88V (orange) and finally E:S55F (green).

pango677

https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice2_genome_2e9c7_185480.json?c=gt-nuc_2659,13730,26408&label=nuc%20mutations:C13730T

alurqu commented 2 years ago

Regarding E:S55F, I'll also note that from a comment in Mou et al "Emerging mutations in envelope protein of SARS-CoV-2 and their effect on thermodynamic properties" (Informatics in Medicine Unlocked, 2021, https://doi.org/10.1016/j.imu.2021.100675) the E:S55F mutation might have an effect on disease severity. If true, this might be significant when combined with this proposed sublineage's apparent growth advantage. What I don't know is whether this mutation would increase or decrease disease severity or if in the end it would actually have no impact. The specific Mou et al quote to which I refer is on Section 3.1.2: "Mutations in the CTD, namely, S55F (128), V62F (129), and R69I (159) may affect the virus pathogenesis, altering the binding of the E protein to a tight junction." And no I haven't figured out the meaning of the numbers in parentheses after the mutations.

Other publications I've found that in general relate to coronavirus envelope proteins and disease severity include:

Jimenez-Guardeño et al, "The PDZ-Binding Motif of Severe Acute Respiratory Syndrome Coronavirus Envelope Protein Is a Determinant of Viral Pathogenesis", PLOS Pathogens, 2014 https://doi.org/10.1371/journal.ppat.1004320

Xia et al, "SARS-CoV-2 envelope protein causes acute respiratory distress syndrome (ARDS)-like pathological damages and constitutes an antiviral target", Cell Research, 2021 https://doi.org/10.1038/s41422-021-00519-4

Nieto-Torres et al, "Severe Acute Respiratory Syndrome Coronavirus Envelope Protein Ion Channel Activity Promotes Virus Fitness and Pathogenesis", PLOS Pathogens, 2014 https://doi.org/10.1371/journal.ppat.1004077

Rahman et al,"Mutational insights into the envelope protein of SARS-CoV-2", Gene Reports, 2020 https://doi.org/10.1016/j.genrep.2020.100997

And then preprints:

Schoemen et al, "Comparative studies of the seven human coronavirus envelope proteins using topology prediction and molecular modelling to understand their pathogenicity", bioRxiv, 2021 https://doi.org/10.1101/2021.03.08.434384

Xia et al, "Why SARS-CoV-2 Omicron variant is milder? A single high-frequency mutation of structural envelope protein matters", bioRxiv, 2022 https://doi.org/10.1101/2022.02.01.478647

alurqu commented 2 years ago

I don't think this is actually BA.2.3.2. I think it's BA.2.3 - in fact, a sublineage of your previous proposal #587 - that has independently picked up E:S55F, and that's leading to it being misclassified by pangoLEARN as BA.2.3.2. Usher shows the order as ORF1a:K798N (blue) then ORF1b:A88V (orange) and finally E:S55F (green).

I will defer to your superior Usher skills. I'm mostly noting the high growth advantages.

FedeGueli commented 2 years ago

E:55F seems homoplasic: it popped up in several.Ba.1 Ba.2 and also Ba.4 sublineages. Same for E:18 and E:61 which recently emerged in several proposed and unproposed lineages

alurqu commented 2 years ago

I've updated the title to reflect @silcn and @FedeGueli's comments. I'll defer to the lineage designation committee as to the proper designation for this sublineage should it be accepted for designation.

Also, for clarification, I've filtered the GenBank sequences for "good" overall quality control status. There may be other matching GenBank sequences which failed to pass this filter.

chrisruis commented 2 years ago

Thanks @alurqu Agree that this looks like a sublineage of BA.2.3. We've added this as BA.2.3.17 to start on the branch with C13730T (Orf1b:A88V) that correlates with the epidemiological event of introduction(s) into Northern Mariana Islands/USA