cov-lineages / pango-designation

Repository for suggesting new lineages that should be added to the current scheme
Other
1.04k stars 97 forks source link

Proposal to broaden BA.5.1.15 to not include the artefacty mutation C10183T #1039

Closed corneliusroemer closed 1 year ago

corneliusroemer commented 1 year ago

Continuing my exploration of potentially artefacty lineages (started in #1029 and continued in #1038), I noticed that BA.5.1.15 was based on sequences from just two labs in two disjoint countries (Brazil-SaoPaolo and Peru).

It's unlikely that a real lineage is sequenced in two countries and not in other labs in Brazil that are also in Sao Paolo.

Of the two lineage defining mutations: C10183T, C10790T only the latter seems real (C10790T) - and is what's potentially causing the other one to appear due to artefacts (C10183T). That explains why this combination pops up in three different locations (also Canada, that has seen the mutation C10183T before - suggesting that mutation arises due to artefact issues).

Maybe @bwlang has some thoughts?

New criterion to identify sketchy lineages:

Action to take: the lineage can stay, but it should not require C10183T

AngieHinrichs commented 1 year ago

Good spot @corneliusroemer. On the UShER tree, BA.5.1.15 is currently annotated on a branch BA.5.1 > C10183T > C10790T with 277 sequences -- but there is a larger branch BA.5.1 > C10790T with 721 sequences, many of them from Brazil/*-FIOCRUZ and Brazil/*-NVBS -- I guess what you're referring to by "not in other labs in Brazil that are also in Sao Paolo".

corneliusroemer commented 1 year ago

Exactly, IB is Butantan and the ones you mentioned are other labs whose presence is reassuring.

Their absence is indicative of an artefact.

At least that's what I've come to believe. It's all very hand wavy though so I very much appreciate your review!

On Mon, Sep 5, 2022, 23:32 Angie Hinrichs @.***> wrote:

Good spot @corneliusroemer https://github.com/corneliusroemer. On the UShER tree, BA.5.1.15 is currently annotated on a branch BA.5.1 > C10183T > C10790T with 277 sequences -- but there is a larger branch BA.5.1 > C10790T with 721 sequences, many of them from Brazil/-FIOCRUZ and Brazil/-NVBS -- I guess what you're referring to by "not in other labs in Brazil that are also in Sao Paolo".

— Reply to this email directly, view it on GitHub https://github.com/cov-lineages/pango-designation/issues/1039#issuecomment-1237470371, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF77AQL24VQ3K65YO74S5QTV4ZRGDANCNFSM6AAAAAAQFGZKTI . You are receiving this because you were mentioned.Message ID: @.***>

AngieHinrichs commented 1 year ago

All designated sequences are still from the branch with C10183T (BA.5.1 > C10183T > C10790T). I have a label BA.5.1.15_no10183 in the UShER tree on BA.5.1 > C10790T (so that branch has been included in lineageTree.pb). I will add some sequences from that branch to lineages.csv.