cov-lineages / pango-designation

Repository for suggesting new lineages that should be added to the current scheme
Other
1.04k stars 97 forks source link

Delta (AY.4) and BA.1 recombinant in France/Denmark [~30 seqs, isolated/passaged in Vero] #444

Closed SVN-PhD closed 2 years ago

SVN-PhD commented 2 years ago

Potential Delta (AY.4) and Omicron recombinant

Description

Sub-lineage of: AY.4 Earliest sequence: 2022-01-17 (France | EPI_ISL_9959921 | EPI_ISL_9879436) Most recent sequence: 2022-02-05 (Netherlands | EPI_ISL_9863764) Countries circulating: Denmark / France / Netherlands Proposed Lineage: If this is real, XD or whatever is next on the PANGO recombinant designations.

I don't have access to the raw sequencing reads to these strains so unsure if this is a real recombinant. However, these strains are highly related and are present in 3 countries in Europe. Additionally, these strains were sequenced in different laboratories with different sequencing platforms (Illumina: EPI_ISL_9879436, EPI_ISL_9879437, EPI_ISL_9857381, EPI_ISL_9449070, EPI_ISL_9166910) and (Nanopore MinION: EPI_ISL_9863764, EPI_ISL_9791275).

NextClade identifies these strains as 21J (Delta) and Pangolin v3.1.20 in UShER mode classifies these as AY.4.

NextClade Alignment of these sequences with some complete/high coverage genomes of Delta and a BA.1 (PHEC-5P0B7ZEF) and BA.2 (ALDP-35D5DD2): Screenshot from 2022-02-16 13-16-10

The breakpoint appears to be right before the S gene.

Zooming in on the S gene: Screenshot from 2022-02-16 13-16-41

Genomes

accessions_list.txt

Additional Mutations

Gene AA Changes
NSP2 E172D

This small cluster shares this mutation in addition to characteristics of Omicron spike mutations.

Phylogenetic Tree

Potential recombinant lineage in the context of other strains (embedded within AY.4) tree001

Zoomed in: tree002

Greater tree (strains highlighted by red box): tree003

c19850727 commented 2 years ago

I believe EPI_ISL_9857381 was collected on 2022/02/04 from Denmark? But otherwise, nice catch!

SVN-PhD commented 2 years ago

Ah you're correct, my mistake. I'll go ahead and edit it. Found an additional two isolates.

Strain Accession
France/IDF-HMN-22012240323 EPI_ISL_9518370
France/NOR-20445345/2022 EPI_ISL_9791300

Screenshot from 2022-02-17 09-33-27

Simon-LoriereLab commented 2 years ago

Dear Scott, very nice catch. The French NRC at Institut Pasteur confirms that the sequences EPI_ISL_9879436 and EPI_ISL_9879437 have overall high coverage, and no minor variant populations suggestive of a co-infection or contamination (both are Illumina data, not Nanopore). Those 2 genomes at least are indeed very likely recombinants. I'll see how to share the raw data!

SVN-PhD commented 2 years ago

Dear Scott, very nice catch. The French NRC at Institut Pasteur confirms that the sequences EPI_ISL_9879436 and EPI_ISL_9879437 have overall high coverage, and no minor variant populations suggestive of a co-infection or contamination (both are Illumina data, not Nanopore). Those 2 genomes at least are indeed very likely recombinants. I'll see how to share the raw data!

Hi Simon-Loriere Lab,

Thank you, that's good to hear from the sequencing side. Also good eye, I've updated the post to indicate which strains were sequenced by which technologies, I definitely needed coffee that day... I think that we're seeing shared mutations with different platforms in different labs in different countries along with information from the French NRC suggest they are likely recombinants!

corneliusroemer commented 2 years ago

Ran the sequences through Nextclade, here's the summary:

image image image

Breakpoints appear to be AY.4: beginning till between S:158 - S:211 BA.1: S:158-S:211 until 25000-25480 (beginning of ORF3a) AY.4: 25000-25480 until end

There are some rare mutations in ORF3a which allow tracking possible donors on the AY.4 side: https://cov-spectrum.org/explore/Europe/AllSamples/Past6M/variants?aaMutations=S%3A27S&nucMutations=21618G%2C25855T%2C25667T

It's a rather short stretch that's been inserted, two breakpoints. Not impossible but we'll have to see raw reads and maybe more examples popping up to confirm.

@SVN-PhD it'd be great if you could share links to the Usher trees so that we can just look at them without having to rebuild. You can just copy the link from the browser address bar, it's something like nextstrain.org/...

Here it is: https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_1058f_fd2c00.json?c=gt-nuc_25855,25667&label=nuc%20mutations:A8723G https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice2_genome_451e_fd3670.json?label=nuc%20mutations:A1321C,A8723G,C14407T,T15264C,G21641T,T25584C,G25855T,T26270C,G26530A,G26577C,A26709G,T26767C,C27259A,T27638C,C27752T,T27807C,C27874T,T28311C,A28461G,A28881T,A28882G,C28883G,G28916T,G29402T,G29540A,G29645T

c19850727 commented 2 years ago

This might be another one: EPI_ISL_10096770 collected on Feb 13th, from Denmark.

image

Simon-LoriereLab commented 2 years ago

Dear colleagues, I have shared the raw data for EPI_ISL_9879436 on GISAID, and EPI_ISL_9879437 is on its way. Both are amplicons (Artic V4.1) / Nextera XT / 2x150 cycles. We are doing more analysis on these samples, but it looks consistent with a recombinant. The putative parental AY.4 sequence appears very specific, which might help us narrow things. Additional suspected samples from France are under investigation, and a Pango designation would help a lot to track this down. Many thanks in advance for your help.

thomasppeacock commented 2 years ago

I've been taking a look at the Danish sequences and I think there might be a hint, if they are truly recombinat, they may have arisen seperatley. This is based on two observations:

1) The Danish samples share a SNP not found in the French/Netherlands samples - C20032T (aka NSP15 - R138C) - however this is found in some European Delta sequences. This might suggest the Danish sequences were from a similar, but independent recombinant event to the other sequences. 2) 2/3 of the Danish sequences are missing the Spike NTD insertion (aka Spike ins214EPE). However all 3 still contain the nearby deletion (del211/L212I) implying this isnt reference backfilling. This is quite a common issue with some bioinformatics pipelines not being able to properly identify insertions but I think it does highlight the need to look at the raw sequence files of all these sequences, as has been done for the French samples.

FedeGueli commented 2 years ago

I have eventually identified the likely AY.4 donor sequences: EPI_ISL_8546521 EPI_ISL_8633701 collected around Xmas2021 in Ile de France and then sequenced by two different labs.

They share with the recombinant proposed lineage: G29540A, G29645T + orf1a:I2820V, orf1b:314F, (orf3a:26L), S:A27S, orf3a:D155Y

Covspectrum Analysis: https://cov-spectrum.org/explore/Europe/AllSamples/Past3M/variants?aaMutations=orf1a%3AI2820V%2Corf1b%3A314F%2Corf3a%3A26L%2CS%3AA27S%2Corf3a%3AD155Y&nucMutations=G29540A%2CG29645T

Usher Tree: https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_1b9c4_2928d0.json?branchLabel=aa%20mutations&label=nuc%20mutations:G21641T chrome_screenshot_1645384491440

EDIT: Looking at the Usher tree more closely i noticed one more EPI_ISL_9368813 this has orf3a:92L too so probably it is the real donor. it has been collected January 5,2022 (

ORF10 Changes (1):V30L
Nt Changes (42):G210T, C241T, A1321C, C3037T, G4181T, C6402T, C7124T, C7851T, A8723G, C8986T, G9053T, C10029T, A11201G, A11332G, C14407T, C14408T, T15264C, G15451A, C16466T, C19220T, C21618G, G21641T, C22097A, T22917G, C22995A, A23403G, C23604G, G23621A, C25469T, C25667T, G25855T, T26767C, T27638C, C27752T, C27874T, A28461G, G28881T, G28916T, G29402T, G29540A, G29645T, G29742T
ORF1ab Changes (13):E352D, A1306S, P2046L, P2287S, A2529V, I2820V, V2930L, T3255I, T3646A, P4715S, G5063S, P5401L, A6319V
S Changes (8):T19R, A27S, L179I, L452R, T478K, D614G, P681R, V687I
ORF3a Changes (3):S26L, S92L, D155Y
M Changes (1):I82T
ORF7a Changes (2):V82A, T120I
ORF7b Changes (1):T40I
N Changes (4):D63G, R203M, G215C, D377Y )

So it has A8723G, G21641T, C25667T, G25855T, G29540A , G29645T , orf1a:I2820V, orf1b:314F, (orf3a:26L), S:A27S, orf3a:D155Y, orf3A:92L in common with the proposed recombinant lineage

Please @corneliusroemer @c19850727 could you check them?

Simon-LoriereLab commented 2 years ago

Dear FedeGueli, thanks for also looking into this. We are focusing on a very similar pattern of mutations, just without ORF3a:S26L, absent from our recombinant sequences (and from additional suspected genomes under investigation).

Dear thomasppeacock, the C20032T change in the sequences from Denmark is indeed very interesting. This variation is absent from the additional suspected sequences from France. The sequence from The Netherlands appears also quite distinct from our recombinant. Fascinating if they are independent. Our sample size is very low (yet?) but it is tempting to wonder if this reflects things on the template switch mechanism (potential “hot spots” like in other viruses) or on the fitness/compatibility of the resulting chimeric genomes...

FedeGueli commented 2 years ago

I think i spotted another sequence belonging to the recombinant lineage from Denmark: EPI_ISL_10014373

corneliusroemer commented 2 years ago

Has anyone figured out a good covSpectrum query to monitor this cluster? I tried the following, but it captures only 6 sequences: https://cov-spectrum.org/explore/Europe/AllSamples/Past6M/variants?aaMutations=S%3A27S&nucMutations=21618G%2C25855T%2C25667T%2C1321C%2C8723G%2C23202A%2C22673C

Simon-LoriereLab commented 2 years ago

Hi corneliusroemer, we are working with these 3 mutations for now but also working on [at least/or] variations with additional markers (1 sequence from France is quite partial and limiting us, and we are also concerned with drops in the spike).

https://cov-spectrum.org/explore/Europe/AllSamples/Past6M/variants?aaMutations=ORF1a%3AI2820V%2CS%3AD796Y%2CS%3A27S

I'm counting 12 sequences from now. The 6 from France come from different regions. The 5 sequences from Denmark all have C20032T as noted by thomasppeacock, and the genome from the Netherlands has several unique changes.

SVN-PhD commented 2 years ago

Hi all, apologies, have been away for the last few days.

Found a 13th genome, this is Denmark/DCGC-379814/2022 with accession EPI_ISL_10207240 with a recent collection date (2022-02-16). Here is a tree of these 13 sequences in context with some GISAID's Europe nextregions in context.

https://nextstrain.org/community/DC-DFS-PHL-NGS/ncov/Recombinant?m=div&s=Denmark/DCGC-333253/2022,Denmark/DCGC-344135/2022,Denmark/DCGC-363269/2022,Denmark/DCGC-369529/2022,Denmark/DCGC-376857/2022,Denmark/DCGC-379814/2022,Denmark/DCGC-381072/2022,France/GES-HMN-22012200931/2022,France/HDF-IPP04947/2022,France/HDF-IPP08027/2022,France/IDF-HMN-22012240323/2022,France/NOR-20375248/2022,France/NOR-20445345/2022,Netherlands/NH-inBiome-210856/2022

Edit: Another new genome (the 14th), this is Denmark/DCGC-381072/2022 with accession EPI_ISL_10210832 with a collection date of 2022-02-06. Tree has been updated accordingly.

corneliusroemer commented 2 years ago

Great build @SVN-PhD, thanks for sharing! Some sequences have reversions to root, but only terminal tips.

SVN-PhD commented 2 years ago

Found another strain in this cluster on GISAID.

hCoV-19/France/HDF-IPP54794/2022 | EPI_ISL_10352397

Will rebuild the tree on Monday after the weekend, stay safe everyone.

Screenshot from 2022-02-25 16-53-04

Edit on 2022-03-01

Tree is now updated with this new sequence: https://nextstrain.org/community/DC-DFS-PHL-NGS/ncov/Recombinant?c=country&m=div&s=Denmark/DCGC-333253/2022,Denmark/DCGC-344135/2022,Denmark/DCGC-363269/2022,Denmark/DCGC-369529/2022,Denmark/DCGC-376857/2022,Denmark/DCGC-379814/2022,Denmark/DCGC-381072/2022,France/GES-HMN-22012200931/2022,France/HDF-IPP04947/2022,France/HDF-IPP08027/2022,France/HDF-IPP54794/2022,France/IDF-HMN-22012240323/2022,France/NOR-20375248/2022,France/NOR-20445345/2022,Netherlands/NH-inBiome-210856/2022

Screenshot from 2022-03-01 08-28-54

Simon-LoriereLab commented 2 years ago

Dear Scott, there should be 2 more from France (already on GISAID or on their way)! All the best, Etienne

PhilippeColson commented 2 years ago

Hi all, We got 3 genomes of this recombinant in Marseille, France. They should soon come out in Gisaid. Best regards, Philippe

SVN-PhD commented 2 years ago

Thank you,

I have incorporated your 3 genomes with another one released today.

Interestingly, France/ARA-14290323596/2022 was sequenced by a Nanopore GridION.

Here is a table of all the strains that we have found in GISAID that are members of this cluster.

Strains

Name Accession Collection Date
France/HDF-IPP54794/2022 EPI_ISL_10352397 2022-01-03
France/GES-HMN-22012200931/2022 EPI_ISL_9959921 2022-01-17
France/HDF-IPP04947/2022 EPI_ISL_9879436 2022-01-17
Denmark/DCGC-333253/2022 EPI_ISL_9166910 2022-01-20
France/IDF-HMN-22012240323/2022 EPI_ISL_9518370 2022-01-21
Denmark/DCGC-344135/2022 EPI_ISL_9449070 2022-01-21
France/NOR-20375248/2022 EPI_ISL_9791275 2022-01-23
France/NOR-20445345/2022 EPI_ISL_9791300 2022-01-27
France/ARA-L27GN0310195/2022 EPI_ISL_10550898 2022-01-31
France/HDF-IPP08027/2022 EPI_ISL_9879437 2022-01-31
France/ARA-14290323596/2022 EPI_ISL_10511379 2022-02-01
France/ARA-14290323761/2022 EPI_ISL_10550896 2022-02-01
France/ARA-14290341624/2022 EPI_ISL_10550897 2022-02-03
France/NOR-20545403/2022 EPI_ISL_10639474 2022-02-03
France/NOR-20545403R/2022 EPI_ISL_10639478 2022-02-03
Denmark/DCGC-363269/2022 EPI_ISL_9857381 2022-02-04
Netherlands/NH-inBiome-210856/2022 EPI_ISL_9863764 2022-02-05
Denmark/DCGC-381072/2022 EPI_ISL_10210832 2022-02-06
France/OCC-IHU-65147/2022 EPI_ISL_10529499 2022-02-07
France/PAC-IHU-64762/2022 EPI_ISL_10528736 2022-02-09
Denmark/DCGC-369529/2022 EPI_ISL_10014373 2022-02-09
Denmark/DCGC-376857/2022 EPI_ISL_10096770 2022-02-13
France/ARA-C41GN0450104/2022 EPI_ISL_10551446 2022-02-14
France/ARA-P37GN0450051/2022 EPI_ISL_10551448 2022-02-14
France/NOR-20715296/2022 EPI_ISL_10639475 2022-02-14
France/NOR-20715296R/2022 EPI_ISL_10639479 2022-02-14
France/PAC-IHU-65148/2022 EPI_ISL_10531214 2022-02-16
Denmark/DCGC-379814/2022 EPI_ISL_10207240 2022-02-16
France/NOR-20745990/2022 EPI_ISL_10639476 2022-02-17
France/ARA-C41GN0520325/2022 EPI_ISL_10551447 2022-02-21
France/NOR-20375248R/2022 EPI_ISL_10639477 2022-02-23

Updated Phylogenetic Tree

Screenshot from 2022-03-02 08-11-08

Screenshot from 2022-03-02 08-11-21

https://nextstrain.org/community/DC-DFS-PHL-NGS/ncov/Recombinant?c=country&gmin=15&gt=ORF1a.352D,S.27S&m=div

Cheers, Scott

Update on 2022-03-02 Additional 6 genomes were uploaded after I built yesterday's tree. I have updated the tree accordingly and added the new genomes to the table listed here.

The new strains are:

Update on 2022-03-03 Added more strains. Some strains look like they're duplicates (have R appended to them). Tree is updated as well. Cheers.

SVN-PhD commented 2 years ago

I've updated the accessions list of the 25 genomes of this potential cluster. Please find it attached. Should this cluster be assigned or is there a further need for investigations @chrisruis @corneliusroemer @thomasppeacock @Simon-LoriereLab?

Cheers, Scott

Accession

accessions_list.txt

Simon-LoriereLab commented 2 years ago

Thanks a lot Scott! I have been kindly given access to most of the raw data of these genomes from France (will have access soon at the data from Marseille) and all of them are clean, no issues of sequencing or contamination. A Pango designation would be extremely helpful indeed as this variant now circulates (at low noise) in many regions of France.

PhilippeColson commented 2 years ago

Of course they are clean otherwise I would not have submitted the consensus genomes to GISAID...

Best

Philippe


Prof. Philippe Colson PU-PH des disciplines pharmaceutiques, Pharm.D / Ph.D IHU Méditerranée Infection http://www.mediterranee-infection.com/ Microbes Evolution Phylogeny and Infections (MEPHI), IRD Aix-Marseille Université Assistance Publique - Hôpitaux de Marseille 19-21, boulevard Jean Moulin 13005 Marseille Tél.: +33 (0) 413 732 427/ 401 Fax: +33 (0)413 732 402 E-mail : @.**@.>


De : Simon-Loriere_Lab @.***> Envoyé : mercredi 2 mars 2022 15:03:28 À : cov-lineages/pango-designation Cc : COLSON Philippe; Comment Objet : Re: [cov-lineages/pango-designation] Potential Delta (AY.4) and BA.1 recombinant in European countries (Issue #444)

Thanks a lot Scott! I have been kindly given access to most of the raw data of these genomes from France (will have access soon at the data from Marseille) and all of them are clean, no issues of sequencing or contamination. A Pango designation would be extremely helpful indeed as this variant now circulates (at low noise) in many regions of France.

— Reply to this email directly, view it on GitHubhttps://github.com/cov-lineages/pango-designation/issues/444#issuecomment-1056965086, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AW3IC2AU2N7BXUKMS2EBZTDU55YLBANCNFSM5OST6HTA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you commented.Message ID: @.***>

SVN-PhD commented 2 years ago

Hi all,

Thanks again @Simon-LoriereLab for the raw reads on GISAID, it was very helpful! From what I see in IGV, the reads look pretty clean. Now I haven't used IGV much before so let's go through what I did.

I used the Cecret pipeline to process raw reads and to also check the consensus genome (corresponds with the submitted sequences on GISAID). I used the sorted .bam and .bai index files to visualize the sorted and mapped reads, relative to the NC_045512.2 genome.

Reads are visualized by pairs and linked. Reads that are linked have a light gray line connecting them. I can see that the 6 bp deletion (resulting in Δ157/158) in the spike gene are linked to to its respective pair with a 3 bp deletion (Δ212) with the 9 bp insertion resulting in EPE at 214. The Δ157/158 is a marker of AY.4 and Δ212 with ins214EPE are markers of Omicron. Am I interpreting this correctly as a sanity check? I've highlighted some of the reads that are linked.

EPI_ISL_9879437v2

This is using France/HDF-IPP08027/2022 | EPI_ISL_9879437. Thank you!!

corneliusroemer commented 2 years ago

@Simon-LoriereLab can you confirm that you managed to isolate the recombinant and passage as these metadata seem to indicate? Well done!

image
Simon-LoriereLab commented 2 years ago

Dear Cornelius, I was just the messenger as I checked the NGS data, my colleagues Flora and Angela from the National Reference Center did the isolation! All the best, Etienne

corneliusroemer commented 2 years ago

There's a preprint up since yesterday: https://www.medrxiv.org/content/10.1101/2022.03.03.22271812v1

c19850727 commented 2 years ago

There's a preprint up since yesterday: https://www.medrxiv.org/content/10.1101/2022.03.03.22271812v1

According to this preprint, I guess these authors might not have discovered this potential recombinant independently? image

If that's the case, maybe @SVN-PhD and @Simon-LoriereLab deserve some sort of acknowledgements.

SVN-PhD commented 2 years ago

Thanks, in addition to the contributions from other folks here, I also want to point out an overlooked contribution that @corneliusroemer identified two recombination breakpoints which I think is wild!

According to this preprint, I guess these authors might not have discovered this potential recombinant independently? image

I'll be more direct here. I don't think the authors of the preprint found it independently as the lead author is here in this thread (@PhilippeColson). What sits uneasily with me is that @Simon-LoriereLab and his colleagues checked the raw sequencing reads and kindly shared the raw fastq files publicly in GISAID over 3 weeks ago (~17 February, 2022). I did some digging and found that Santé Publique France and Institut Pasteur put out a statement on how they are monitoring it and how the EMERGEN consortium is working to characterize the recombinant: https://www.santepubliquefrance.fr/dossiers/coronavirus-covid-19/coronavirus-circulation-des-variants-du-sars-cov-2 (publication dated 23/02/2022). I suspect @Simon-LoriereLab was the one who sent the alert to French efforts for increased surveillance of this recombinant.

This preprint has spread like wildfire through the news, such as this one from Reuters: https://www.reuters.com/business/healthcare-pharmaceuticals/variant-that-combines-delta-omicron-identified-dogs-sniff-out-virus-with-high-2022-03-09/ Screenshot from 2022-03-11 08-01-29

This puts a chill in open ended efforts in public sequencing databases and open collaboration, especially as laboratories in Institut Pasteur put in the work to confirm the recombination. By rushing out a preprint to be "first" and using names like "Deltacron" or "Deltamicron", these unconventional names have stirred up a hornet's nest of conspiracy theories on social media. For example, I've seen claims that this validates the "Deltacron" from Cyprus that was the result of contamination or a conspiracy theory to detract attention from the current situation in Ukraine.

It isn't difficult to communicate any of this to any of the contributors here, especially since this is a public discussion. This highlights the fragility of trust in science and confidence from the public. By nature, I am not a confrontational person but this issue is worth pointing out. While this discourse is unrelated to what this repo is for (after all, we are here to identify and monitor any potential emerging lineages), I do think it is pertinent to discuss on what is done in a public forum.

rambaut commented 2 years ago

I have locked this conversation as it has moved in a direction that would be best pursued in the MedRxiv comment section.

chrisruis commented 2 years ago

Thanks all, this has been added as lineage XD in v1.2.133