cov-lineages / pango-designation

Repository for suggesting new lineages that should be added to the current scheme
Other
1.04k stars 97 forks source link

Proposal to broaden BQ.1.28 to parental lineage without ins22303TCAAGATGGATG, C25096T and then designate its sublineage with these two mutations - Nextclade already assigned sequences without these two mutations as BQ.1.28 #1558

Closed AnonymousUserUse closed 2 weeks ago

AnonymousUserUse commented 1 year ago

BQ.1.28 was proposed in https://github.com/cov-lineages/pango-designation/issues/1415 and was designated as BQ.1 + C44T, A1777G, G6734A, C20429T, C25096T, G25595T, T29492C, ins22303TCAAGATGGATG. The defining mutations are C25096T, ins22303TCAAGATGGATG. Since Nextclade masks insertion, in total 6 mutations (C44T, A1777G, G6734A, C20429T, C25096T, G25595T, T29492C) are used to distinguish BQ.1.28 and BQ.1, and thus 4 mutations from that are sufficient for Nextclade to assign a sequence of BQ.1 as BQ.1.28. This leads to the problem that a large number of sequences without defining mutations C25096T, ins22303TCAAGATGGATG are assigned as BQ.1.28 by Nextclade. In fact, only 317 of 3839 mutations assigned as BQ.1.28 by Nextclade carry the defining mutations C25096T. BQ.1.28 assigned by Nextclade: https://cov-spectrum.org/explore/World/AllSamples/AllTimes/variants?nextcladePangoLineage=BQ.1.28& Real BQ.1.28 with T29492C: https://cov-spectrum.org/explore/World/AllSamples/AllTimes/variants?nucMutations=C25096T&nextcladePangoLineage=BQ.1.28&

Regarding the extremely low positive predictive value of BQ.1.28 assigned by Nextclade, I think it would be better to broaden BQ.1.28 to parental lineage without ins22303TCAAGATGGATG, C25096T, i.e. BQ.1 + C44T, A1777G, G6734A, C20429T, G25595T, T29492C. It might be reasonable to designate BQ.1 + C44T, A1777G, G6734A, C20429T, G25595T, T29492C as new BQ.1.28 with more than 3000 sequences to solve the problem despite low epidemiological significance. Nevertheless, this lineage reached 10% prevalence in California, a state in the US with around 40 million population. image Then BQ.1 + C44T, A1777G, G6734A, C20429T, G25595T, T29492C + ins22303TCAAGATGGATG, C25096T can be designated as a sublineage of BQ.1.28, i.e. BQ.1.28.1. Alternative, since BQ.1 + C44T, A1777G, G6734A, C20429T, G25595T, T29492C + ins22303TCAAGATGGATG, C25096T does not show any growth advantage any more, this sublineage may also not be designated for now. (Another choice is, BQ.1.28 may also be withdrawn directly for lack of epidemiological events. A problematic designation would be worse than no designation.)

Similar problems were reported in https://github.com/nextstrain/nextclade/issues/966 and https://github.com/nextstrain/nextclade/issues/1045. Actually, it would be better if Nextclade could force a certain defining mutation as a must to assign a lineage (so that no adjustment from the side of designation is necessary to solve those problems), but in short term, broadening BQ.1.28 would make sense.

FedeGueli commented 1 year ago

Btw over 320 seqs now of BQ.1.28 (the real one)

AnonymousUserUse commented 1 year ago

Also Pangolin misassigns a high number of BQ.1 sequences to BQ.1.28. Only 752 from 5326 sequences assigned by Pangolin as BQ.1.28 contain mutation C25096T, as of 2023/2/20 via CoV-Spectrum.

AngieHinrichs commented 1 year ago

The trouble is that BQ.1.28 would become the parent of BQ.1.29 - awkward. This sounds more like a nextclade issue than a pango-designation issue to me. @corneliusroemer is there a way to make nextclade a bit more selective about what it will call BQ.1.28? (Close-but-not-quite situations like this are why I started adding extra non-Pango-lineage labels in the big tree and then converting them to vanilla Pango lineages in the minimized tree for pangolin. Perhaps you could add a node to represent the branch up to ORF1a:V2157I (G6734A) before BQ.1.28 and BQ.1.29, but let it be assigned BQ.1?)

image
AnonymousUserUse commented 1 year ago

The trouble is that BQ.1.28 would become the parent of BQ.1.29 - awkward.

I think with the designation of BQ.1.29, Nextclade and Pangolin/pangoLEARN would NOT assign those sequences that belong to the parental lineage of both BQ.1.28 and BQ.1.29 to BQ.1.28 any more, and instead assign to BQ.1. This kind of solutions was successful in the past, e.g. the problem from the wrong assignment of BF.15 was solved after BF.20 had been designated, see https://github.com/nextstrain/nextclade/issues/966. So I may close this issue after the correct assignment is confirmed.

Nevertheless, it would be great if Nextclade can add a node for the assignment of certain branches in the future.

DailyCovidCases commented 3 months ago

Please close this issue

DailyCovidCases commented 3 months ago

Please close this issue

DailyCovidCases commented 3 weeks ago

Please close this issue