cov-lineages / pango-designation

Repository for suggesting new lineages that should be added to the current scheme
Other
1.04k stars 97 forks source link

Large “BA.4.1" messy branch in usher #1761

Closed aviczhl2 closed 1 year ago

aviczhl2 commented 1 year ago

It seems that there's a large ’BA.4.1‘ branch on usher which actually contains almost every omicron lineage, especially some of the most recent XBB* lineages in undersampled areas like India.

Shall call a fix for this, for many seqs on this large "BA.4.1" branch, their pango assignments are also incorrect, leaving nowhere to find their correct lineage.

I'm proposing this because there's a visible upwarding trend of India seqs belonging to this "BA.4.1" branch. This branch contains ~900 seqs, less than 0.1% of the total omicron seqs from BA.4.

In samples from India collected from 2023-1-1~2023-1-31, 3/334(0.9%) belongs to this branch. However, in seqs collected from 2023.2.1 to now, 7/218(3.2%) are on this branch.

@AngieHinrichs

usher

Screen Shot 2023-03-15 at 15 00 24 Screen Shot 2023-03-15 at 15 04 05
FedeGueli commented 1 year ago

The omicron mini-me trees are a known issue, in the past it was BA 5.1 to host one of these mini trees. i dont know how much is worth to spending time to resolve, it is like the trash bin of sequences, even if it is true that rarely se real recombinant is hidden there.

aviczhl2 commented 1 year ago

The omicron mini-me trees are a known issue, in the past it was BA 5.1 to host one of these mini trees. i dont know how much is worth to spending time to resolve, it is like the trash bin of sequences, even if it is true that rarely se real recombinant is hidden there.

7 out of 218 seqs (3.2%) from India sampled from 2023.2.1 to now belongs to this single branch. Given this branch has ~900 total members out of million+ omicron sequences after BA.4, this 3.2% proportion is extraordinarily high.

I'm simply wondering where their real positions are...

Meanwhile, only 3 out of 334 (0.9%) of Indian 2023.1 seqs belong to this branch.

AngieHinrichs commented 1 year ago

Yes, that is an ugly one, thanks @aviczhl2 for pointing that out. It has become an attractor for common Omicron amplicon-dropout false reversions. I will remove some or all of those sequences so that it doesn't continue to attract sequences with matching combinations of false reversions.

In fact, many of those sequences in gray further out on the branch should have been excluded from the daily build due to too many reversions found by nextclade, but my nextclade-parsing script needed a tweak to recognize 23A as omicron (happy new year!). So I will be removing ~15k mostly XBB.1.5-ish sequences with >5 reversions that I assume are distributed among various dropout "trash bins" across Omicron.

aviczhl2 commented 1 year ago

it seems that this bug is fixed. Thank you @AngieHinrichs !