cov-lineages / pango-designation

Repository for suggesting new lineages that should be added to the current scheme
Other
1.04k stars 98 forks source link

B.1.526: significantly broader than Nextstrain 21F -- and WHO Iota? #154

Closed AngieHinrichs closed 3 years ago

AngieHinrichs commented 3 years ago

@russcd found a large branch of the UCSC tree with >4000 sequences that pangoLEARN consistently calls as B.1.526, but Nextclade does not call as 21F (Iota) -- instead, Nextclade calls 20C (B.1 + G25563T/ORF3a:Q57H + C1059T). We know that Pango lineages, Nextstrain clades and WHO VoC/VoI are not meant to be synonymous, but seeing so much disagreement at this scale was a little concerning.

Long story short, this has to do with the merging of B.1.526.1 (not a descendant of B.1.526) with B.1.526 from #45. I'm not sure exactly what mutations WHO means when it refers to Iota as on https://www.who.int/en/activities/tracking-SARS-CoV-2-variants/, but there is a significant difference between Nextstrain 21F and the current Pango B.1.526 (the common ancestor of the previous B.1.526 and B.1.526.1 which were cousins not ancestor/descendant). In the UCSC tree, the path of mutations to the current B.1.526 is

C14408T > C241T > A23403G > C3037T > G25563T > C1059T > A10323G > C28869T > G10323A > C23664T > G28975A > C25517T > A20262G > T9867C

-- that branch has 54k leaves/sequences. The path to Nextstrain 21F (i.e. the node that includes all of the Iota mutations from clades.tsv except C21575T/S:L5F which is in the Problematic Sites set & masked in the UCSC tree) continues with several more mutations:

A22320G > C21846T > A16500C

-- that branch has ~40k leaves/sequences, 22.8k of which have G23012A/S:E484K (most of the original B.1.526, but not the former B.1.526.2; there seems to be a back-mutation A23012G/S:K484E on the path to that, but then it adds S:S477N).

Even though B.1.526 may not be meant to be synonymous with WHO Iota, https://www.who.int/en/activities/tracking-SARS-CoV-2-variants/ lists B.1.526 as the Pango lineage for Iota, and so users of Pango lineages will most likely assume that a sequence designated or assigned B.1.526 belongs to Iota.

What do you think about possibly adding a sublineage B.1.526.4 that would correspond more closely to Nextstrain 21F and presumably WHO Iota, @aineniamh @andersonbrito? Then the WHO could refer to the sublineage instead of the whole B.1.526, and assigned lineages would be easier to interpret.

Here is a Nextstrain view of where the current B.1.526 representatives fall in the UCSC tree: https://nextstrain.org/fetch/hgwdev-angie.gi.ucsc.edu/~angie/B.1.526.reps.2021-07-20.json?branchLabel=Spike%20mutations

And here is the same view, but marked up with some notes and converted to a grainy PNG:

B 1 526 IotaOrNot_Page_1

andersonbrito commented 3 years ago

Hi @AngieHinrichs,

With the latest genomes that were released, new monophyletic clades expanded, and B.1.526 is quite messy, indeed (now polyphyletic, broken into many groups).

This seems to be the current situation (see Figures below):

That would somehow match the proposal you shared last week, @AngieHinrichs. But let me know if you see the same patterns below using UShER. Concerning the few genomes that are still being named B.1.526.1 and B.1.526.2 (shown here), those need to be fixed, as some are simply B.1.526 (21F, Iota), while others are in that bottom clade.

The three below can be found here: https://nextstrain.org/community/andersonbrito/pango/iota?branchLabel=aa&c=gt-S_484,701,477,452

B 1 526_AB B 1 526_CD
AngieHinrichs commented 3 years ago

Thanks so much @andersonbrito for the detailed analysis and images!

Panel D (a possible solution): we could let the large clade representing 21F as B.1.526 (which in the panel D includes B.1.526 + B.1.6xw [blue] + B.1.6xx [light orange] + B.1.6xy [light green]). This will match the current WHO/nextclade's classification. Finally, the bottom clade, here tagged as B.1.6xz [gold yellow], would need to be designated as a new lineage name to it.

Sounds good to me. So we would leave most of the B.1.526 representative sequences in lineages.csv as they are, but we would change the designation of sequences formerly designated as B.1.526.1 and currently designated as B.1.526 (in pango-designation-154.proposed.B.1.6xz.tsv.txt) to the next available B.1.X lineage.

I further propose removing sequence USA/NY-MSHSPSP-PV17752/2020 (EPI_ISL_801866) from B.1.526. It is closest to the old B.1.526.1 (proposed new lineage), but it does not have the Spike mutations at 80, 157 and 452 that the rest of the sequences have, so I think it should be just B.1.

I also propose downgrading these samples formerly designated B.1.526.3 and currently designated B.1.526 to B.1 because they are really not very similar to any of the other B.1.526-associated clades:

Luxembourg/LNS9461736/2021
Luxembourg/LNS4948053/2021
Luxembourg/LNS5785410/2021
Belgium/CHUNamur13191626/2021
Luxembourg/LNS6053410/2021
Belgium/ULG-12466/2021
Luxembourg/LNS4693077/2021
Belgium/CHUNamur13191713/2021
Belgium/CHUNamur13191658/2021
Luxembourg/LNS1421682/2021

If there were ongoing transmission, I guess they could be proposed as another lineage, but as far as I can tell there have been no more recent sequences similar to those.

chrisruis commented 3 years ago

Now included in v1.2.63 with new lineage B.1.637 (36 sequence designations updated from B.1.526) and the updated designation of 11 sequences from B.1.526 to B.1