cov-lineages / pango-designation

Repository for suggesting new lineages that should be added to the current scheme
Other
1.04k stars 97 forks source link

Lineages B.1.526, B.1.526.1 and B.1.526.2 #45

Closed andersonbrito closed 3 years ago

andersonbrito commented 3 years ago

By Anderson Brito & Nathan Grubaugh Lab.

Description

Sub-lineage of: B.1.526

Earliest sequence: B.1.526 (2020-11-23); B.1.526.1 (2020-09-07); B.1.526.2 (2021-02-10)

Most recent sequence: 2020-09-07

Countries circulating: USA

These B.1.526 lineages and sub-lineages were recently reassigned, and their classifications are currently mixed up, as can be seen in the tree below (see image and link)

Genomes B.1.526: metadata of 230 genomes (download here) B.1.526.1: metadata of 49 genomes (download here) B.1.526.2: metadata of 89 genomes (download here)

Evidence Image: available here Build: available here

Proposed lineage name Same as already proposed by Pango team. We only want to report the need for an update of the B.1.526 lineage group assignments.

chrisruis commented 3 years ago

Hi @andersonbrito, thanks for letting us know. I've updated the classifications for B.1.526, B.1.526.1 and B.1.526.2 from your genomes, the update should appear in release 1.1.14

rambaut commented 3 years ago

Seems like the Auspice tree may be problematic: image There are some conflicting homoplasies

rambaut commented 3 years ago

So the two unassigned ones that split these branches are probably the problematic sequences. Also that long one looks odd.

rambaut commented 3 years ago

@andersonbrito Can you send the alignment to me offline?

AngieHinrichs commented 3 years ago

FWIW S:L5F (C21575T) is in the Problematic Sites list as highly homoplasic, so we mask it out when building our trees.

Problematic Sites initial report on virological (there have been several updates, but 21575 was there from the beginning): https://virological.org/t/issues-with-sars-cov-2-sequencing-data/473

Problematic Sites VCF file: https://raw.githubusercontent.com/W-L/ProblematicSites_SARS-CoV2/master/problematic_sites_sarsCov2.vcf (the 7th column contains either 'caution' or 'mask' -- we mask only the positions with 'mask')

rambaut commented 3 years ago

I think it could be pertinent here because it is associated with the Q1011H and G1946S - I suspect all three need to be pushed back to the common ancestor of B.1.526 and B.1.526.2 with the two unassigned ones (the thin branches) being the problems (presumably they are missing the corresponding pairs). Homoplasic sites can just be neutral sites that are flip-flopping and thus actually provide useful fine scale information (I don't know the pattern of L5F - will look into it).

rambaut commented 3 years ago

Doing this would make B.1.526.2 less distinguishable from B.1.526 and would probably warrant the withdrawal of B.1.526.2 as a sub lineage. It is possible the S477N should be pushed back to and then either the top clade has a 477S reversion or this part of the tree needs to be re-rooted with the bottome clade as more of an outgroup. This would probable help make more sense for the root-to-tip plot too:

image
rambaut commented 3 years ago

Also C21575T (L5F) has occurred within patients suggesting that it is a true homoplasy so likely phylogenetically informative.

rambaut commented 3 years ago

I think these are the two that are causing the problems:

image image
rambaut commented 3 years ago

That last one shares many of the same mutations as the stem of the B.1.526.1 lineage:

image
rambaut commented 3 years ago

Possibly a recombinant but more likely an artifactual mosaic. @andersonbrito - worth checking these two out for mixtures or contamination issues.

andersonbrito commented 3 years ago

Thank you for letting us know, Andrew. We will check the raw data of those genomes.

If those genomes are excluded, is the remaining data enough to distinguish the three lineages? I'll send you the alignment in a minute, offline.

Anderson Brito

On Tue, 13 Apr 2021 at 18:03, Andrew Rambaut @.***> wrote:

Possibly a recombinant but more likely an artifactual mosaic. @andersonbrito https://github.com/andersonbrito - worth checking these two out for mixtures or contamination issues.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cov-lineages/pango-designation/issues/45#issuecomment-819053641, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAWEXDWGKERSJYSDDUQPON3TISWTNANCNFSM42VUWNUA .

andersonbrito commented 3 years ago

@rambaut, I have just sent you an email with alignment and tree files.

trvrb commented 3 years ago

@rambaut: I'm not sure what version of pangolin is being run by GISAID, but here's the latest I see:

Screen Shot 2021-04-18 at 11 59 32 AM

on GISAID currently B.1.526.1 is a clear sister lineage to B.1.526. From https://cov-lineages.org/lineages/lineage_B.1.526.html I believe that B.1.526 is supposed to capture S:D253G and its descendants. However, the lineage for B.1.526.1 is sister to D253G.

Take a look at: https://nextstrain.org/groups/blab/ncov/ny/B.1.526?c=gt-S_253

I have noticed that the homoplasies do complicate structure within B.1.526, but I believe that 253G is consistently partitioning these lineages in question as sister lineages.

andersonbrito commented 3 years ago

In this link is the phylogeny with bootstrap values (PDF view here) , mentioned in my previous message.

Purple = B.1.526 Blue = B.1.526.1 Light Blue = B.1.526.2

andersonbrito commented 3 years ago

There is something odd with B.1.526 and sublineages: B.1.526 often shows up as a sister clade of B.1.526.1 and B.1.526.2. Additionally, B.1.526.3 is not being assigned. I would expect this structure: (B.1.526(B.1.526.1, B.1.526.2, B.1.526.3)). But what we observe is: (B.1.526, B.1.526.1, B.1.526.2), with B.1.526.3 not being assigned properly.

https://nextstrain.org/community/grubaughlab/CT-SARS-CoV-2/connecticut?c=pango_lineage&d=tree&f_division=Connecticut&p=full

It seems that what is being called B.1.526 in the link above is actually B.1.526.3. With that, I would expect the designation B.1.526 only assigned to genomes that lack the signatures found in the sublineages .1, .2, and .3. But so far, B.1.526 is presented as a sister group of all the other sublineages, while it should be a parental lineage (see below).

Screen Shot 2021-06-11 at 9 39 10 AM
aineniamh commented 3 years ago

Looking at the designated sequences, B.1.526, B.1.526.1 and B.1.526.2 on the surface look very similar from an epi point of view (https://raw.githubusercontent.com/cov-lineages/pango-designation/master/lineages.csv), they've all got sequences from CT little other locations desigated.

Currently the only sequences designated B.1.526.3 are

Luxembourg/LNS9461736/2021,B.1.526.3
Luxembourg/LNS4948053/2021,B.1.526.3
Luxembourg/LNS5785410/2021,B.1.526.3
Belgium/CHUNamur13191626/2021,B.1.526.3
Luxembourg/LNS6053410/2021,B.1.526.3
Belgium/ULG-12466/2021,B.1.526.3
Luxembourg/LNS4693077/2021,B.1.526.3
Belgium/CHUNamur13191713/2021,B.1.526.3
Belgium/CHUNamur13191658/2021,B.1.526.3
Luxembourg/LNS1421682/2021,B.1.526.3

Any of the assignments that are beyond what's in the designation list already can be included if they are >95% complete to help with these assignments.

The question for the .1 and .2 sublineages here may be whether they should be sublineages or just merged back into the parent if it seems they're not robustly being distinguished in a phylogeny.

aineniamh commented 3 years ago

Merged in sublineages to B.1.526 in commit ec2335a.

aineniamh commented 3 years ago

Release tagged: https://github.com/cov-lineages/pango-designation/releases/tag/v1.2.13