cov-lineages / pango-designation

Repository for suggesting new lineages that should be added to the current scheme
Other
1.05k stars 98 forks source link

2nd-Generation BA.2 Saltation Lineage, >30 spike mutations (3 seq, 2 countries, Aug 14) #2183

Closed ryhisner closed 1 year ago

ryhisner commented 1 year ago

Description Sub-lineage of: BA.2 Earliest sequence: 2023-7-24, Denmark Most recent sequence: 2023-7-31; Denmark & Israel Countries circulating: Denmark (2), Israel Number of Sequences: 3 GISAID AA Query: Spike_E484K, V445H GISAID Nucleotide Query: T22032C, C22033A, A22034G CovSpectrum Query: T22032C & C22033A & A22034G Substitutions/Deletions/Insertions on top of BA.2: Spike: ins16_MPLF (ins21608_TCATGCCGCTGT), R21T, S50L, ∆69-70, V127F, ∆Y144, F157S, R158G, ∆N211, L212I, L216F, H245N, A264D, I332V, D339H, K356T, R403K, V445H, G446S, N450D, L452W (2-nuc), N460K, N481K, ∆V483, A484K (2-nuc), F486P, E554K (Denmark seq only), A570V, P621S, I670V (Israel seq only), H681R, S939F, P1143L N: Q229K M: D3H, T30A, A104V ORF1a: A211D, V1056L, N2526S, A2710T, V3593F, T4175I Nucleotide: C897A, G3431T, A7842G, C8293T, G8393A, G11042T, A12160G, C12789T, T13339C, T15756A, A18492G, ins21608TCATGCCGCTGT, C21711T, G21941T, T22032C, C22208T, A22034G, C22295A, C22353A, A22556G, G22770A, G22895C, T22896A, G22898A, A22910G, C22916T, ∆23009-23011, G23012A, C23013A, T23018C, T23019C, C23271T, C23423T, A23604G, C24378T, C24990T, C25207T, A26529C, A26610G, C26681T, C26833T, C28958A

USHER Tree (for what it's worth) https://nextstrain.org/fetch/raw.githubusercontent.com/ryhisner/jsons/main/2nd-Gen_BA.2.json?c=gt-nuc_897&label=id:node_5341437

image

​​Evidence One day after the first sequence i this lineage was uploaded from Israel (Sunday 13-Aug), two sequence were uploaded from Denmark (Monday 14-Aug), and one of them has a collection date a week earlier than the Israel sequence. This one's already gone international and is likely circulating in a country with little genetic surveillance. The only question at this point is whether this will be a situation like BS.1.1 or BA.2.83, where a hugely divergent, 2nd-generation lineage spreads but never has a large impact or whether this will be closer to a BA.1-type situation.

Genomes

Genomes EPI_ISL_18096761, EPI_ISL_18097315, EPI_ISL_18097345
FedeGueli commented 1 year ago

Alternative non spike nuc query: A7842G, C8293T, G8393A

C897A, G3431T, A7842G, G8393A is another query by @HynnSpylor

shay671 commented 1 year ago

Folks , regarding the Israeli sample : Its a patient. had contact with 2 people living with which where infected before her (a week or so). All 3 has no immunocompromised background. They are nor chronic patients.

FedeGueli commented 1 year ago

Folks , regarding the Israeli sample : Its a patient. had contact with 2 people living with which where infected before her (a week or so). All 3 has no immunocompromised background. They are nor chronic patients.

Any link with abroad?

xz-keg commented 1 year ago

alternative discussion https://github.com/sars-cov-2-variants/lineage-proposals/issues/606

xz-keg commented 1 year ago

Folks , regarding the Israeli sample : Its a patient. had contact with 2 people living with which where infected before her (a week or so). All 3 has no immunocompromised background. They are nor chronic patients.

Do they have any infection(or symptom) histories? How many times of omicrons(based on personal estimate?) have they infected in 2022 and 2023?

FedeGueli commented 1 year ago

alternative discussion sars-cov-2-variants/lineage-proposals#606

I keep that open for discussion, ideas hypothesis thoughts, here for findings . cc @corneliusroemer ok?

FedeGueli commented 1 year ago

Alternative non spike nuc query: A7842G, C8293T, G8393A

C897A, G3431T, A7842G, G8393A another query by @HynnSpylor

silcn commented 1 year ago

This is missing C9866T = ORF1a:L3201F, which was present in almost all BA.2 outside southern Africa due to a founder effect. Suggests a southern African origin for this variant, potentially even the Omicron source.

There are a few shared mutations with BA.1 as I think has been alluded to on Twitter. The ones in Spike probably arose independently even if this did come from the Omicron source, but G8393A = ORF1a:A2710T is fairly rare outside BA.1 and might be suggestive of recombination.

(note: given that it's unlikely a recombinative origin can ever be proved, and the evidence for this coming from the Omicron source is much weaker than for BA.4 and BA.5, I would suggest this gets the next available BA.2.x designation rather than BA.6)

FedeGueli commented 1 year ago

This is missing C9866T = ORF1a:L3201F, which was present in almost all BA.2 outside southern Africa due to a founder effect. Suggests a southern African origin for this variant, potentially even the Omicron source.

There are a few shared mutations with BA.1 as I think has been alluded to on Twitter. The ones in Spike probably arose independently even if this did come from the Omicron source, but G8393A = ORF1a:A2710T is fairly rare outside BA.1 and might be suggestive of recombination.

yeah the 9866C branch of BA.2 was common only in SA @corneliusroemer proposed a bunch of sublineage of them with at least one got designated i recall. If i dont recall badly they were successfully exported just in Germany ( and Germany was where XAK emerged another BA.1/Ba.2 complex recomb and also BA.5.9 the S:R346I branch of BA.5 stemming from the politomy so likely South african too).

I am going to check RKI seqs on Open CovSpectrum

FedeGueli commented 1 year ago

G8393A, G11042T, A12160G, C12789T, T13339C, T15756A, A18492G, ins21608TCATGCCGCTGT, C21711T, G21941T, T22032C, C22208T, A22034G, C22295A, C22353A, A22556G, G22770A, G22895C, T22896A, G22898A, A22910G, C22916T, ∆23009-23011, G23012A, C23013A, T23018C, T23019C, C23271T, C23423T, A23604G, C24378T, C24990T, C25207T, A26529C, A26610G, C26681T, C26833T, C28958A

A12160G is just a reversion from G12160A of BA.4/5 so unreal due to misrooting by Usher i think.

silcn commented 1 year ago

This is missing C9866T = ORF1a:L3201F, which was present in almost all BA.2 outside southern Africa due to a founder effect. Suggests a southern African origin for this variant, potentially even the Omicron source.

BA.2-without-9866T had a branch with C26681T which reached 5-10% of BA.2-without-9866T in South Africa in early 2022. This could be a descendant of that branch, in which case it likely wouldn't be from the Omicron source.

silcn commented 1 year ago

Ah, this could be it, there's a small branch with S:939F within the C26681T branch.

939F

It's not a direct descendant of any of these sequences but I reckon the correct placement has it descending from the base of this branch.

xz-keg commented 1 year ago

It seems the USher by Ryan doesnt work here the link: https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_398d9_a47ef0.json?c=gt-nuc_25416&label=id:node_10733306

Currently usher only supports a maintainence time of 2 days. Can we maintain a permanant link of such usher files? How?

For example, how to create a fetch link if I store this .json on my personal website?

@AngieHinrichs

NkRMnZr commented 1 year ago

It seems the USher by Ryan doesnt work here the link: https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_398d9_a47ef0.json?c=gt-nuc_25416&label=id:node_10733306

Currently usher only supports a maintainence time of 2 days. Can we maintain a permanant link of such usher files? How?

For example, how to create a fetch link if I store this .json on my personal website?

@AngieHinrichs

https://nextstrain.org/fetch/raw.githubusercontent.com/NkRMnZr/hSC2-Tracking-Log/main/JSON/EPI_ISL_18096761%20BA.2_Level_8_Saltation_Cluster.json?label=id:node_10733314

you need to download json file from foot of UShER page, upload it to your own repo, and paste the raw file link after https://nextstrain.org/fetch/

AKruschke commented 1 year ago

Any thoughts on G446S and F486P?

Might it be a recombinante of BA.2.75 + XBB.1.5 + BA.2 ?

victorlin commented 1 year ago

It seems the USher by Ryan doesnt work here the link: https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_398d9_a47ef0.json?c=gt-nuc_25416&label=id:node_10733306

Currently usher only supports a maintainence time of 2 days. Can we maintain a permanant link of such usher files? How?

@aviczhl2 The link in the issue description has an extra https:// which should be dropped. This is the intended link: https://nextstrain.org/fetch/raw.githubusercontent.com/ryhisner/jsons/main/2nd-Gen_BA.2.json

silcn commented 1 year ago

Might it be a recombinante of BA.2.75 + XBB.1.5 + BA.2 ?

More likely just a lot of convergent evolution, many of the RBD mutations are things we've seen before, just not all together. The mutations at 481-484 are the really new part.

ryhisner commented 1 year ago

It seems the USher by Ryan doesnt work here the link: https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_398d9_a47ef0.json?c=gt-nuc_25416&label=id:node_10733306

I was just in a hurry, so I screwed up the link. It should work now. I learned how to make permanent Usher tree links from @silcn, who explained it a couple years ago on here.

krosa1910 commented 1 year ago

Note: S:H245N also appears in BA.2.3.20* (all of them), is it very common in all saltation?

oobb45729 commented 1 year ago

This is missing C9866T = ORF1a:L3201F, which was present in almost all BA.2 outside southern Africa due to a founder effect. Suggests a southern African origin for this variant, potentially even the Omicron source.

There are a few shared mutations with BA.1 as I think has been alluded to on Twitter. The ones in Spike probably arose independently even if this did come from the Omicron source, but G8393A = ORF1a:A2710T is fairly rare outside BA.1 and might be suggestive of recombination.

(note: given that it's unlikely a recombinative origin can ever be proved, and the evidence for this coming from the Omicron source is much weaker than for BA.4 and BA.5, I would suggest this gets the next available BA.2.x designation rather than BA.6)

Could it be a reversion? ORF1a:L3201 might be an important residue. ORF1a:L3201P is in both Iota and Lambda.

jasondorjeshort commented 1 year ago

Note: S:H245N also appears in BA.2.3.20* (all of them), is it very common in all saltation?

483- and 245N were in #1692 (a few dozen sequences from Ukraine in February). 245N has also been seen in at least BQ.1.1.48 as a point mutation. Actually most of the RBD and NTD mutations (or different ones at the same positions, ~35 in total compared to ~12 outside the S1) have been in some previous interesting variant - it's extremely improbable.

I've never commented on either of these github projects before (been following since the BA.1 issue), but I do believe this thread has been shared on social media and could (depending on resharing and spread over the upcoming days) get some attention from the public. Best to be ready for that.

oobb45729 commented 1 year ago

I want to highlight the mutation S:S50L here. As of today, a search of S:S50L only gives less than 1000 results on covSPECTRUM. For a C-to-U mutation, it's not much. https://cov-spectrum.org/explore/World/AllSamples/AllTimes/variants?variantQuery=S%3AS50L& However, a large percentage of them are chronic singlets! There are some mutations that seem to occur more often in chronic sequences than non-chronic sequences. S:S50L is one of the most extreme cases. Up till now, very rarely we've seen clusters with S:S50L detected in multiple people.

I wondered whether there are mutations that like to occur with S:S50L together. I found potential candidates, S:P621X.

1 I kind of expect P621S to show up since P621S also is often associated to chronic sequences, but P621L/R/T are much rarer mutations, which only have less than 600 results together on covSPECTRUM as of today. P621T only has 48 results and 8 of them are with S50L (an AY.103 sublineage, which is one of rare cases that an S50L lineage spread to multiple people).

I think there might be something special about the S50L+P621X combination.

oobb45729 commented 1 year ago

S:P1143L (and P1143S) are also associated with chronic sequences often.

corneliusroemer commented 1 year ago

Thanks for raising opening this issue @ryhisner and the productive discussion everyone. In particular getting some extra epidemiological info @shay671!

It would be great if we could keep discussion here in this issue limited to new sequences and phylogenetics (putative parent lineages, discussion of recombinant nature or not) with exception for extra epidemiological info like Shay shared about patient history.

Discussion of the putative function of individual mutations is off topic here and better placed in https://github.com/sars-cov-2-variants/lineage-proposals/issues/606 or in other issues in that repo.

I'll try to moderate this issue a little as it may get a significant attention/readership. If I hide comments as off-topic, this is just to keep the most salient information most easily locatable. For broader discussion please use https://github.com/sars-cov-2-variants/lineage-proposals/issues/606 or open another issue there - for example if you have questions that aren't directly related to this lineage (e.g. how to save Usher trees for longer than 2 days).

corneliusroemer commented 1 year ago

I'll add a bit more context as I investigate. Most of this has already been shared by many valuable contributions above. I will try to synthesize:

Here is the distribution of nucleotide mutations compared to BA.2:

image

This lineage shares 4 of the 5 mutations that BA.4 and BA.5 have in common: T22917G, T23018G, G23040A and deletion 21765-21770 in nucleotides which equals L452R, F486V, R493Q and 69/70del as Spike amino acids. This lineage however lacks the synonymous nucleotide mutation G12160A. As all the three mutations shared with BA.4/5 are all known to be highly convergent but the synonymous mutation G12160A is not shared, this pattern is not strong evidence pointing towards BA.4/5 being a more likely parent than BA.2. To all intents and purposes, this appears like a BA.2 descendant (though recombination can never be ruled out in such diverged lineages).

Here's the tree placement by Nextclade. Note that this is tree is BA.2/4/5-and-descendants-only and rooted on BA.2.

image

These are the mutations of one of the Danish sequences as broken down into reversions to references, labelled (known convergent or lineage typical) and unlabelled mutations:

image

The Israeli sequence has half a dozen nucleotide differences compared to the Danish sequences. Off the top of my, one of these nucleotide differences appears to be a typical sequencing artefact (S408R or appear to be typical sequencing artefacts known to occasionally occur in Israeli sequences. At least one mutation in the Israeli sequence is in a region unsequenced in the Danish sequences (21622). So the actual tip-to-tip divergence is likely smaller than 8 but there may be real divergence which could point towards extended undetected spread rather than explosive growth.

These are the mutations that the Danish sequences have that the Israeli doesn't have: C12815T, G23222A

These are the mutations that the Israeli sequence has, that that Danish don't have: C2137T, G21624C, T29793A, C21622T, C22786A, A23570G, C28153T, G29000A, deletion 23009-23011, insertion 21608:TCATGCCGCTGT

jbloom commented 1 year ago

@corneliusroemer, why do you think spike:S408R is a sequencing error in the Israeli sequence? At the amino-acid level, spike:S408R is roughly neutral in deep mutational scanning of Omicron, so this seems like a reasonable mutation to be real. Or is it because it is reversion towards Wuhan-Hu-1 and you think it comes from some sort of calling missing coverage to reference?

FedeGueli commented 1 year ago

@corneliusroemer, why do you think spike:S408R is a sequencing error in the Israeli sequence? At the amino-acid level, spike:S408R is roughly neutral in deep mutational scanning of Omicron, so this seems like a reasonable mutation to be real. Or is it because it is reversion towards Wuhan-Hu-1 and you think it comes from some sort of calling missing coverage to reference?

It is widespread in seqs from Israel

ryhisner commented 1 year ago

Yeah, about 2/3 of the sequences uploaded from Israel on the same day as this sequence—from all different lineages—didn't register S:R408S. It's common for this to happen in other labs around the world as well. Same issue with S:K417N. Most sequences from South America don't register it.

jbloom commented 1 year ago

Yeah, about 2/3 of the sequences uploaded from Israel on the same day as this sequence—from all different lineages—didn't register S:S408R. It's common for this to happen in other labs around the world as well. Same issue with S:K417N. Most sequences from South America don't register it.

Is there a typo in your post, @ryhisner? Do you mean don't register S:R408S? And so look like they have S:S408R relative to BA.2?

corneliusroemer commented 1 year ago

@jbloom There are a few typical reversions to reference that at least very often are artefacts based on the fact they occur in particular labs all the time all over the SC2 tree.

I haven't seen an investigation of potential recombinant parents in this issue so I'll add one here as a baseline:

Based on mutations with respect to BA.2 as annotated by Nextclade, there is no smoking gun for obvious recombinant parents. 23F (EG.5.1) shares two mutations in ORF1b, but that could just be coincidence: C12789T, A18492G, especially as other XBB mutations are missing in the region 12-19k.

To cast a broad net, I did a quick covSpectrum query with all the private mutations of this lineage as extracted from Nextclade, thresholding at various numbers. The results:

It's possible to miss some potential parent through this queries, but at least if such a parent existed, the event was either a long time ago or the parent contributed only a small part to this lineage.

More extensive searches are welcome. In particular, it might make sense to restrict searches to various subsets of the genome to be more specific - the current query matches current XBB lineages best due to the presence of 486P and other convergent Spike RBD mutations.

FedeGueli commented 1 year ago

23F (EG.5.1) shares two mutations in ORF1b, but that could just be coincidence: C12789T, A18492G, especially as other XBB mutations are missing in the region 12-19k.

Focusing on the danish sequences which have C12815T too : the combo C12815T, C12789T have been in 97 recent samples from various lineages: https://cov-spectrum.org/explore/World/AllSamples/Past6M/variants?nucMutations=C12815T%2CC12789T&nextcladePangoLineage1=gk.1& mostly XBB.1.9.1 and around half of them from Asia . But none of them has also A18429G.

ryhisner commented 1 year ago

Yeah, about 2/3 of the sequences uploaded from Israel on the same day as this sequence—from all different lineages—didn't register S:S408R. It's common for this to happen in other labs around the world as well. Same issue with S:K417N. Most sequences from South America don't register it.

Is there a typo in your post, @ryhisner? Do you mean don't register S:R408S? And so look like they have S:S408R relative to BA.2?

Yes, I was in a hurry this morning and misspoke. I'll edit to make clear.

silcn commented 1 year ago

Could it be a reversion? ORF1a:L3201 might be an important residue. ORF1a:L3201P is in both Iota and Lambda.

Potentially, yes it could, though I doubt we'll have much idea until we eventually get a better handle on where this variant arose. Assuming a reversion there doesn't seem to yield any clear potential ancestors either.

FedeGueli commented 1 year ago

One more thing S:N460K here is gained via T22942A as BQ / CL.1 / XAW and differently from all BA.2.75/BA.2.3.20/XBB backbones.

shay671 commented 1 year ago

@corneliusroemer, why do you think spike:S408R is a sequencing error in the Israeli sequence? At the amino-acid level, spike:S408R is roughly neutral in deep mutational scanning of Omicron, so this seems like a reasonable mutation to be real. Or is it because it is reversion towards Wuhan-Hu-1 and you think it comes from some sort of calling missing coverage to reference?

Its a known problem I see a lot in our samples. If any know the source id love to give the labs your insights. BTW we use artic v4 in Israel and next seq/nova seq if that helps.

theosanderson commented 1 year ago

S:S408R = nt:C22786A which is a reversion to reference. 22786 lies within primer 75_RIGHT of ARTIC V4 (and the primer will contain reference sequence). At first glance that would suggest that primer sequences for some reason aren't being trimmed correctly in this case.

Primer effects could also potentially explain the absence of S:E554K = nt:G23222A in the Israel sequence as that is within V4 78_LEFT.

xz-keg commented 1 year ago

the only question at this point is whether this will be a situation like BS.1.1 or BA.2.83, where a hugely divergent, 2nd-generation lineage spreads but never has a large impact or whether this will be closer to a BA.1-type situation.

Investigating the immune background of current patients may help. If the patients have XBB* infection history and still suffer a ~100% infection rate it willmore likely be like BA.1-type situation. Otherwise it may be like BS.1.1.

However it seems really hard to figure out the infection history as testing is lowered to almost zero.

agamedilab commented 1 year ago

the only question at this point is whether this will be a situation like BS.1.1 or BA.2.83, where a hugely divergent, 2nd-generation lineage spreads but never has a large impact or whether this will be closer to a BA.1-type situation.

Investigating the immune background of current patients may help. If the patients have XBB* infection history and still suffer a ~100% infection rate it willmore likely be like BA.1-type situation. Otherwise it may be like BS.1.1.

However it seems really hard to figure out the infection history as testing is lowered to almost zero.

If I read the meta data correctly at least the 2 patients from Denmark had their last infection in the beginning of 2022 --> hence not likely a XBB* infection history.

EPI_ISL_18097345 Additional host information: n_infections=2,last_infection_date=2022-01-10 EPI_ISL_18097315 Additional host information: n_infections=2,last_infection_date=2022-01-31

xz-keg commented 1 year ago

If I read the meta data correctly at least the 2 patients from Denmark had their last infection in the beginning of 2022 --> hence not likely a XBB* infection history.

EPI_ISL_18097345 Additional host information: n_infections=2,last_infection_date=2022-01-10 EPI_ISL_18097315 Additional host information: n_infections=2,last_infection_date=2022-01-31

Yeah we know that they get infected during the BA.1/2 wave,

But this doesn't prove that they never infected again since then. As Denmark, together with most countries largely reduced testing after early 2022.

corneliusroemer commented 1 year ago

Based on available evidence to date, this lineage appears to be a direct descendant of BA.2. Shared mutations with BA.4/5 are plausibly due to convergence in Spike. No clear sign of recombination with a known circulating lineage has been uncovered despite thorough searches.

As a result, this lineage will be designated as BA.2.86.

I will add a designation commit with edits to lineages.csv and lineage_notes.txt later today.

kimleeng commented 1 year ago

For the Danish samples a bit of epidemiological info from our institutes twitter,

A new Omicron BA.2 subvariant has been observed. The variant was seen in one case in Israel, and two cases in Denmark. None of the Danish cases are immunocompromised, and no epidemiological link between the cases. There is no indication that the new variant causes severe illness.

https://twitter.com/ssi_dk/status/1691813752453177821?s=61&t=a2Eto6ibqsvisaYAhiSaWw

unrulyturnip commented 1 year ago

Also, WHO now watching it https://twitter.com/mvankerkhove/status/1692160025043595682?s=20

agamedilab commented 1 year ago

+1 USA: EPI_ISL_18110065

grafik

grafik https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_3eb69_e2d9b0.json?c=userOrOld

Sinickle commented 1 year ago

Worth noting that the one mutation the USA seq has beyond the Israel seq shown in the tree @agamedilab shared above is S:E554K, which is in the Danish sequences, which is more evidence that the differences between these sequences is mostly artefactual.

Additionally, the USA sequence is indicated as baseline surveillance, and collected more recently (August 3rd), as one of only 9 sequences from Michigan collected that recently.

FedeGueli commented 1 year ago

Worth noting that the one mutation the USA seq has beyond the Israel seq shown in the tree @agamedilab shared above is S:E554K, which is in the Danish sequences, which is more evidence that the differences between these sequences is mostly artefactual.

Additionally, the USA sequence is indicated as baseline surveillance, and collected more recently (August 3rd), as one of only 9 sequences from Michigan collected that recently.

Agree completely with you. there are not two lineages but one only.

Over-There-Is commented 1 year ago

2023-08-17 (8) 2023-08-17 (7) Maybe the phylogenic tree should be like this?

corneliusroemer commented 1 year ago

Nextclade's development version can now call this lineage: https://master.clades.nextstrain.org/

HynnSpylor commented 1 year ago

Now WHO labels it as VUM. A rapid assessment is needed

屏幕截图 2023-08-18 000725
silcn commented 1 year ago

Agree completely with you. there are not two lineages but one only.

There are still a few real-looking differences between the two Danish sequences and the Israel/Michigan sequences. The fact that the Israel and Michigan sequences look to be identical suggests to me that we have two different very recent transmissions from the same chronic host, like BA.1 and BA.2 but much more similar to each other, and extremely rapid growth. Hope I'm wrong.

ryhisner commented 1 year ago

As noted above, the new sequence from Michigan has S:E554K like the sequences from Denmark, but it also has S:I670V, ORF8:T87I and N:G243S, all of which were in the sequence from Israel but absent from the two Denmark sequences.