cov-lineages / pango-designation

Repository for suggesting new lineages that should be added to the current scheme
Other
1.04k stars 98 forks source link

Sub-lineages of B.1.617.2 (Delta) in circulation in Brazil #300

Closed andersonbrito closed 3 years ago

andersonbrito commented 3 years ago

By Anderson Brito (Instituto Todos pela Saúde, ITpS)

Description: Sub-lineages of B.1.617.2 (Delta) in circulation in Brazil. These sub-lineages belong to a large clade, currently assigned to many unspecific, newly proposed designations (pango-designation v1.2.91). The suggestions below relate to monophyletic clades with more than 50 taxa.

Sub-lineage of: B.1.617.2

Proposed lineage name: AY.112 Earliest sequence: 2021-09-25 Most recent sequence: 2021-10-10 Countries circulating: Brazil Clade-defining SNP: ORF1a:S2285T

Proposed lineage name: AY.113 Earliest sequence: 2021-09-12 Most recent sequence: 2021-10-07 Countries circulating: Brazil Clade-defining SNP: C5512T

Proposed lineage name: AY.114 Earliest sequence: 2021-09-12 Most recent sequence: 2021-10-06 Countries circulating: Brazil Clade-defining SNP: C27645T

Proposed lineage name: AY.115 Earliest sequence: 2021-09-12 Most recent sequence: 2021-10-06 Countries circulating: Brazil Clade-defining SNP: S:N950D

Proposed lineage name: AY.116 Earliest sequence: 2021-09-12 Most recent sequence: 2021-10-14 Countries circulating: Brazil Clade-defining SNP: ORF3A:L106I

Proposed lineage name: AY.117 Earliest sequence: 2021-09-13 Most recent sequence: 2021-10-04 Countries circulating: Brazil Clade-defining SNP: ORF1a:S3344A

Proposed lineage name: AY.118 Earliest sequence: 2021-09-12 Most recent sequence: 2021-10-07 Countries circulating: Brazil Clade-defining SNP: ORF8:P30Q

Proposed lineage name: AY.119 Earliest sequence: 2021-09-13 Most recent sequence: 2021-10-01 Countries circulating: Brazil Clade-defining SNP: S:N950D

Proposed lineage name: AY.120 Earliest sequence: 2021-09-12 Most recent sequence: 2021-10-15 Countries circulating: Brazil Clade-defining SNP: ORF8:P30Q

Proposed lineage name: AY.121 Earliest sequence: 2021-09-12 Most recent sequence: 2021-09-29 Countries circulating: Brazil Clade-defining SNP: T27645C

Proposed lineage name: AY.122 Earliest sequence: 2021-09-12 Most recent sequence: 2021-10-13 Countries circulating: Brazil Clade-defining SNP: S:G142D

Proposed lineage name: AY.123 Earliest sequence: 2021-09-12 Most recent sequence: 2021-10-07 Countries circulating: Brazil Clade-defining SNP: C1267T

Proposed lineage name: AY.124 Earliest sequence: 2021-09-12 Most recent sequence: 2021-09-29 Countries circulating: Brazil Clade-defining SNP: ORF1a:G1531S

Proposed lineage name: AY.25 Earliest sequence: 2021-09-12 Most recent sequence: 2021-10-06 Countries circulating: Brazil Clade-defining SNP: G27561C

Genomes In this TSV file there are lists of genomes from each of the proposed sub-lineages above.

Evidence Drop the file mentioned above on this nextstrain build to see the proposed lineages (unders Color by = 'new_lineages'). Switch the Color by to 'Lineage' to see how those clades are currently classified.

corneliusroemer commented 3 years ago

@andersonbrito Do you know whether these lineages are already contained in any of the 60 new lineages that have not yet been released within pangoLEARN? I haven't figured out a good way to check, have you?

It's really tricky right now to deal with new lineages if we don't know whether they or closely related ones have already been designated by @chrisruis.

We need a faster way to get those designations out on trees, the 2+ weeks it takes right now are too long.

andersonbrito commented 3 years ago

Hi @corneliusroemer ,

The genomes used in the nextstrain build below had their lineages assigned using the latest designations, and the last version of pangoLEARN.

https://nextstrain.org/community/andersonbrito/ncovbr/vigilancia

The current pangoLEARN version dates from 2021-10-18, but it was released on 2021-10-29. In this case, I'm not sure if the list of lineages in pangoLEARN matches the latest designations.

@chrisruis, do the latest versions available in the links above include all the recent lineages that were recently designated? Based on what I see, that is likely not the case, as the decision tree model under use on 'pangoLEARN data release 2021-10-18' is linked to 'pango-designation v1.2.88' (not v1.2.91, which is the one I currently see when I run pangolin -dv).

corneliusroemer commented 3 years ago

It includes lineages designated by 2021-10-18 so there's still loads missing. That's exactly the problem. We need some sort of continuous integration/builds that don't take 2 weeks. We should be able to see new designated lineages within a day.

FedeGueli commented 3 years ago

Sorry but i missed AY.103>>AY.111 too. Hope soon we will have a much clearer picture

chrisruis commented 3 years ago

From what I can see, the proposed AY.113_new through AY.125_new are in AY.99.2. AY.99.2 is a large Brazil lineage which has been added to the designation list but I don't think is in pangoLEARN yet. I don't think proposed AY.112_new has been designated yet

andersonbrito commented 3 years ago

Thank you for letting us know, @chrisruis. It's good to know that you can identify this instances of re-submission of the same lineages. If most of them are already designated, and you could spot new ones among those I pointed out, we are all set.

FedeGueli commented 3 years ago

@chrisruis will AY.99.2 start with ORF1a: T4087I or with T27645C?? In both cases the proposed AY.112 will be a sublineage of AY.99.2 too

chrisruis commented 3 years ago

I've still got some of the Delta tree to get through but once I've done that, I'll run through all of the open issues and match them up with the new AYs and then add any in that haven't already been designated.

We've just posted a summary of the AYs on pango.network that includes their geographical locations and associated mutations and will update this as new AYs get added in: https://www.pango.network/summary-of-designated-ay-lineages/ AY.99.2 is currently down to start on a branch with C4927T (synonymous) and C12525T (Orf1ab:T4087I)

FedeGueli commented 3 years ago

Thank you very much for your work @chrisruis it will be of great help to better understand the dynamics. I will check later the link and if i'll save some time i ll try to help with the matching between new AY.s and open issues .

FedeGueli commented 3 years ago

@corneliusroemer @chrisruis i manually checked the mutations list (published here: https://www.pango.network/summary-of-designated-ay-lineages/) if corresponding to any open issue: i found AY.58,AY.63 (probably referring to the same norwegian lineage) AY.66,AY.69, AY.91 ,AY.99, Ay.99.1,AY.99.2, AY.95 and AY.104 (Maldives-Srilanka AY.32 thing) and AY.106

This one is already designated as AY.4 AY.89 UK C7851T (Orf1ab:A2529V)

CC @andersonbrito probably better to formulate again each of your proposals as sublineages of AY.99/AY.99.1/AY.99.2(if any)

andersonbrito commented 3 years ago

Although unlikely, could any of those clade defining mutations be recurring substitutions? I was informed by the pango team that S:142G is frequently observed as 'recurring' mutation, but that is an artefact of assembly.

andersonbrito commented 3 years ago

Just a correction (already fixed above): the presence of S:142G (not 142D) is a likely artefact in Delta. Based on this preprint:

"G142D is fixed in Delta, with essentially all apparent back mutations being artifacts"

If I understood this genome assembly issue correctly, the AY.122 sub-lineage proposed above, which contains S:142D, is a true clade defining mutation. If that was not yet designated, it should be worth it taking a look.

AngieHinrichs commented 3 years ago

This one is already designated as AY.4

AY.89 UK C7851T (Orf1ab:A2529V)

C7851T is the final mutation on the path to AY.4, but it's not the only mutation shared by AY.4 sequences. The table in https://www.pango.network/summary-of-designated-ay-lineages/ lists only the final mutation at the start of each lineage (sometimes more than one mutation when it's not clear in which order they were acquired).

AY.89 is very similar to AY.4, but includes 3 extra mutations: C21302T, C21304A and G21305A. Those sites are marked 'caution' in the Problematic Sites repo. Aside from those 3 mutations, the paths in the UCSC/UShER tree to AY.4 and AY.89 are remarkably similar. Here are the last several mutations on each in the UCSC/UShER tree:

AY.4    A11332G > A11201G > C6402T > C21846T > C7851T

AY.89   A11332G > A11201G > C6402T > C21304A > G21305A > C21302T > C21846T > C7851T

If C21302T, C21304A and G21305A are errors/artefacts, then probably AY.89 sequences are really AY.4 since all of their other mutations are the same (aside from mutations acquired after C7851T within each set of sequences). But I don't know how to determine whether C21302T, C21304A and G21305A are errors in the AY.89 sequences.

FedeGueli commented 3 years ago

Thank you @AngieHinrichs , in some weeks we will see how it would look like, if some Y145H+A222V (or others from AY.4.s) in AY.89 will pop.up in AY.89 it would be very likely an artefact.

theosanderson commented 3 years ago

Just a correction (already fixed above): the presence of S:142G (not 142D) is a likely artefact in Delta. Based on this preprint:

"G142D is fixed in Delta, with essentially all apparent back mutations being artifacts"

If I understood this genome assembly issue correctly, the AY.122 sub-lineage proposed above, which contains S:142D, is a true clade defining mutation. If that was not yet designated, it should be worth it taking a look.

Yes, it is very unlikely that the 142D mutation occurs at the base of this clade, rather than the base of Delta as a whole. (Of course it's still possible there could be a real clade, defined by some upstream mutations.)

theosanderson commented 3 years ago

@corneliusroemer I got something together that displays designations on the UCSC public tree. It loses a lot of the designations unfortunately -- partly because of the limitations of public data and possibly also some issue with EpiToPublic mapping -- but may sometimes be useful. https://pangotax.theo.io/ The menu also includes pango_lineage_usher, which is the monophyletic Usher-tree-based lineage, rather than a version from pangolin --usher (these results, and the tree, are all from @AngieHinrichs to be clear - thanks Angie!). It is easiest to use the Search feature rather than just the colours.

@chrisruis looking at this I happened to spot that I think England/MILK-17B3688/2021 is possibly misdesignated as AY.4 rather than AY.4.2 (I spot checked that it has 145H and 222V).

It should be updated several times a day, but is of course quite limited by the caveats above.

AngieHinrichs commented 3 years ago

https://pangotax.vercel.app/

That's fantastic Theo, thanks! I've been wanting to add UShER-assigned lineage as a coloring option to our protobufs but haven't got around to it. Waiting for you to do it works for so many things! :) Let me know if there's anything I can do to ease the mapping issues. By EpiToPublic do you mean https://github.com/CDCgov/SARS-CoV-2_Sequencing/blob/master/files/epiToPublic.tsv.gz ?

theosanderson commented 3 years ago

I do mean that file but I didn't mean to suggest there was specifically something wrong with it (although I did seem to, sorry about that!) -- just a more general point that I haven't ruled out that I'm in some way failing to match up all of the public data with my lineages.csv taxon name -> EPI_id -> GenBank ID -> public_metadata strain name chain.

AngieHinrichs commented 3 years ago

Ah, some of the names in lineages.csv are outdated as @corneliusroemer pointed out in #257 which complicates the lineages.csv taxon name -> EPI_id part of that.

chrisruis commented 3 years ago

Thanks @theosanderson - pangotax looks super useful

chrisruis commented 3 years ago

It looks like the proposed AY.112_new is also in AY.99.2. So I think all of the proposed AY.112-AY.125 are in AY.99.2. I'd suggest that for now we designate a single AY.99.2 lineage but can split this up if there's subclades that exhibit increased growth, etc moving forwards