cov-lineages / pango-designation

Repository for suggesting new lineages that should be added to the current scheme
Other
1.04k stars 97 forks source link

BQ.1 + A17039G sublineage circulating in China with Orf1a:A2033V and Orf1a:E754K - 153 seqs 26-02-23 #1559

Closed FedeGueli closed 1 year ago

FedeGueli commented 1 year ago

There may be a lineage circulating in China sampled from travellers from China directed to both Italy and France.

Tree: Schermata 2023-01-17 alle 09 26 01 https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_251da_65aec0.json?c=gt-ORF1ab_2033,754&label=id:node_6395123

Defining mutations: G22599C > A17039G > C22599G (S:R346R -rev) > C28603T > T27546C > Orf1a:A2033V (C6363T) >Orf1a:E754K ( G2525A)

Gisaid query : NSP12_Y273H, NSP2_E574K , NSP3_A1215V finds 116 sequences

Sequences: EPI_ISL_15083815, EPI_ISL_15402876, EPI_ISL_15456767, EPI_ISL_15505840, EPI_ISL_15520750, EPI_ISL_15531965, EPI_ISL_15537954, EPI_ISL_15541285, EPI_ISL_15554531, EPI_ISL_15579955, EPI_ISL_15580157, EPI_ISL_15585055, EPI_ISL_15613295, EPI_ISL_15616106, EPI_ISL_15669273, EPI_ISL_15670870, EPI_ISL_15677274, EPI_ISL_15681812, EPI_ISL_15682663, EPI_ISL_15697113, EPI_ISL_15741228, EPI_ISL_15743188

EPI_ISL_16343208 sampled in Italy from a traveller from China ( likely there are 4 of them ) EPI_ISL_16534815 sampled in France from a traveller from China

Edited thx to @AngieHinrichs kind explanation in the comments below it could be called BQ.1

focosi-cyber commented 1 year ago

Is this BQ.1 or BQ.1.1 ? in other words, it is 346R a reversion or a wild-type? Given it is from China, it comes with a lot of implications/speculations

FedeGueli commented 1 year ago

2 more sample from Israel

FedeGueli commented 1 year ago

Is this BQ.1 or BQ.1.1 ? in other words, it is 346R a reversion or a wild-type? Given it is from China, it comes with a lot of implications/speculations

@AngieHinrichs could you take a look at this lineage please? is it a BQ.1.1 which had 346 reverted to R or a BQ.1 that acquired A17039G ?

AngieHinrichs commented 1 year ago

Is this BQ.1 or BQ.1.1 ? in other words, it is 346R a reversion or a wild-type?

I believe it is BQ.1 and the apparent reversion at 22599 (S:346) is a tree-building error. It is tricky to reconcile the maximum parsimony approach with recurring mutations that confer a growth advantage like G22599C (S:R346T). Sorry that it makes it harder to identify the real parental lineage in cases like this.

It seems most likely that G22599C happened multiple times, for example on BQ.1 > A17039G (BQ.1.1) and also directly on BQ.1 (BQ.1.19). But it's equally parsimonious to suppose that G22599C happened only once, and then A17039G afterwards, and then to explain the existence of some sequences with A17039G but not G22599C by making a reversion on 22599. 🙁

I recently added the label BQ.117039 to the branch with the reversion on 22599 so that sequences on that branch would not be falsely assigned to BQ.1.1 in the minimized-for-pangolin tree. Whenever you see a label with a real lineage followed by '`' and something cryptic, its purpose is to prevent bad assignments in pangolin. In the minimized tree for pangolin, the '_`' and anything that follows it is removed from the label -- so there are multiple labels for BQ.1 and some other lineages. In the daily build, I need to maintain unique labels so that each label is carried forward from one day to the next, so we get the ugly labels like BQ.1_17039 when I disagree with the order of mutations in the tree but can't convince matOptimize to change them by removing a few stray sequences (there are just too many sequences with G22599C!).

This is happening more and more frequently with the trend toward convergent evolution with advantageous Spike mutations recurring and spreading. We should look into enhancing matUtils (or matOptimize?) to recognize this pattern (mutations A > B > C > revB) and fix it (move A > B > C > revB to A > C, and move A > B > C to A > C > B).

FedeGueli commented 1 year ago

Thank you @AngieHinrichs now i ve corrected title and main post!

aviczhl2 commented 1 year ago

Most of the sequences are from Europe local transmissions starting from October. Others are from USA or Mexico local sequences starting from late Nov.

None of them are from travelers from China to Japan, South Korea or Singapore. Also none from Chinese local provincial CDCs.

Guess this lineage is more likely to be related to Europe-related planes or European airports.

FedeGueli commented 1 year ago

Most of the sequences are from Europe local transmissions starting from October. Others are from USA or Mexico local sequences starting from late Nov.

None of them are from travelers from China to Japan, South Korea or Singapore. Also none from Chinese local provincial CDCs.

Guess this lineage is more likely to be related to Europe-related planes or European airports.

Yes that was my thought too. Lets see if some new samplesshow up in the next way

FedeGueli commented 1 year ago

It doesnt seem to circulate in China as today closing this .