cov-lineages / pango-designation

Repository for suggesting new lineages that should be added to the current scheme
Other
1.04k stars 97 forks source link

Sequence of traveller from South Africa failed to get assigned to any lineage by pangolin, with a long branch to Omicron #359

Closed bioinforME closed 2 years ago

bioinforME commented 2 years ago

This sequence we got this morning is from a return traveller from South Africa.

Pangolin failed to assign any lineage to it, neither scorpio.

On UShER tree it is close to one sequence from South Africa "SouthAfrica/CERI-KRISP-K032307/2021|EPI_ISL_6795834|2021-11-17", which is assigned to B.1.1.37 by Pangolin, but it is assigned to B.1.1.529 by UShER. They sit next to the Omicron clade but they are quite divergent. SNP distance scan of over 5000 Africa sequences with a collection date after 1 September 2021 downloaded from GISAID did not find anything else close to this sequence.

UShER tree

fasta sequence available here

Closest Sequences from Africa

rambaut commented 2 years ago

Hi. The reason it is not being detected as Omicron is that at 14 of the sites that define Omicron, this sequence has the allele of the reference genome. Running Scorpo on it: scorpio haplotype -i sequences.aln.fasta --output-counts --append-genotypes -n 'Omicron (B.1.1.529-like)' gives this: scorpio.csv

query,ref_count,alt_count,ambig_count,other_count,support,conflict,nuc:C241T,orf1a:K856R,nuc:C3037T,nuc:T5386G,del:6513:3,orf1a:A2710T,orf1a:T3255I,orf1a:P3395H,del:11283:9,orf1a:I3758V,nuc:T13195C,orf1b:P314L,nuc:C15240T,orf1b:I1566V,spike:A67V,spike:T95I,spike:G339D,spike:S371L,spike:S373P,spike:K417N,spike:N440K,spike:G446S,spike:S477N,spike:T478K,spike:E484A,spike:Q493R,spike:G496S,spike:Q498R,spike:N501Y,spike:T547K,spike:D614G,spike:H655Y,spike:N679K,spike:P681H,spike:N764K,spike:D796Y,spike:N856K,spike:Q954H,spike:N969K,nuc:C25000T,e:T9I,m:D3G,m:Q19E,m:A63T,nuc:A27259C,nuc:C27807T,n:RG203KR
QLD2568,14,30,1,2,0.638300,0.297900,N,K,T,T,0,A,I,H,X,I,T,L,C,V,A,T,D,X,P,N,K,G,N,K,A,R,G,R,Y,T,G,Y,K,H,K,Y,N,H,K,T,I,D,E,T,C,T,KR

It would be useful to check the raw data and see if these calls are correct - in which case this may represent a divergent 'intermediate'.

bioinforME commented 2 years ago

Thanks for your reply rambaut. I manually went through all the Omicron alleles and found all the ref and alt calls are well supported with over 300 reads at each position and above 90% support for each very one of them. Also the additional mutations are well supported. See my manual record here.

I am sharing the BAM files below, if you want to have a close look. The sample has run through with different primer sets and Artic V4 sequence got the best coverage, while there are about three alleles need to check the Midnight BAM as there are dropouts in V4.

Artic V4 BAM Midnight BAM

rneher commented 2 years ago

@bioinforME @corneliusroemer pointed this out to me. You sequence looks like Omicron after Spike position ~200, before that, it looks quite different. So it is not surprising that different classifiers give different results:

https://clades.nextstrain.org/?input-fasta=https://nctyla.dm.files.1drv.com/y4mXEycoaokxdmFhT2JyyjWCO1qdYEQflhycY3lAPN8jhFXxKXQqnAGArmKJLI5WHn_fHLb5EcLmpM2vKwmPYjfvcHVCQw38d9MeagD5P8XephXAEN7tpSIgTP9Km5UGKxPyeRCxPFKpdeJtdvigYzit8vqbY2YhPPW5ruolt3mmvUeFwkcLm_tQD80fNtOvWFvLKHy96jQhnCg95JBv-C_ww

rneher commented 2 years ago

Given that SouthAfrica/CERI-KRISP-K032307/2021|EPI_ISL_6795834|2021-11-17 is very similar with the same breakpoint pattern, this could be a real recombinant. But this needs a closer look

rambaut commented 2 years ago

If you look at the Scorpio output, it is flipping back and forth between the Omicron defining alleles and reference:

mutation allele call
nuc:C241T N [ref]
orf1a:K856R K [ref]
nuc:C3037T T  
nuc:T5386G T [ref]
del:6513:3 0 [ref]
orf1a:A2710T A [ref]
orf1a:T3255I I  
orf1a:P3395H H  
del:11283:9 X  
orf1a:I3758V I [ref]
nuc:T13195C T [ref]
orf1b:P314L L  
nuc:C15240T C [ref]
orf1b:I1566V V  
spike:A67V A [ref]
spike:T95I T [ref]
spike:G339D D  
spike:S371L X  
spike:S373P P  
spike:K417N N  
spike:N440K K  
spike:G446S G [ref]
spike:S477N N  
spike:T478K K  
spike:E484A A  
spike:Q493R R  
spike:G496S G [ref]
spike:Q498R R  
spike:N501Y Y  
spike:T547K T [ref]
spike:D614G G  
spike:H655Y Y  
spike:N679K K  
spike:P681H H  
spike:N764K K  
spike:D796Y Y  
spike:N856K N [ref]
spike:Q954H H  
spike:N969K K  
nuc:C25000T T  
e:T9I I  
m:D3G D [ref]
m:Q19E E  
m:A63T T  
nuc:A27259C C  
nuc:C27807T T  
n:RG203KR KR  

So it is either an intermediate - that hadn't acquired all of the defining mutations (i.e., is part of a larger pool of diversity), or it the consensus of a mixture of 2 lineages.

MCB6 commented 2 years ago

or it the consensus of a mixture of 2 lineages.

don't they see allele depths of ~ 300 to zero at the nucleotide positions where the expected Omicron matches were REF instead?

if it is a lineage mix, then one can only get such a clean allele depth pattern if the PCR is very allele-biased. But they resequenced with different primers and still missed the expected Omicron alleles, right?

rambaut commented 2 years ago

Yes - I haven't looked at the BAM files but that is what @bioinforME states above. I also doubt that amplicons could ever come come up so cleanly.

rambaut commented 2 years ago

So it definitely looks like a genome that is circulating in RSA - it groups with a recently submitted outgroup that we were looking into:

image
rambaut commented 2 years ago

We will look into how we adjust the definition of B.1.1.529 to accommodate this.

rneher commented 2 years ago

These two groups have a number of interesting differences. Omicron has L3674-, S3675-, G3676-, while this group has S3675-, G3676-, F3677- like alpha, beta, gamma. There are a few more such differences.

rambaut commented 2 years ago

Yes - I am just characterising the full set. Will post here shortly. One other thing to note is it doesn't have 69/70del so will not give S gene target failure

rambaut commented 2 years ago

Actually it does seem to have L3674-, S3675-, G3676- it just has a T->G that is confusing the alignment:

image
rambaut commented 2 years ago

The default minimap2 alignment is this:

image
rambaut commented 2 years ago

This mutation results in F3677L but it seems much more parsimonious that the deletion happened once followed by the mutation

corneliusroemer commented 2 years ago

@bioinforME the sequence you uploaded is almost identical to hCoV-19/South Africa/CERI-KRISP-K032307/2021|EPI_ISL_6795834|2021-11-17 up to some extra Ns

Where's yours from? It's strange that two sequences would be near identical if it's an artefact.

MCB6 commented 2 years ago

it groups with a recently submitted outgroup that we were looking into:

the two previous outliers K032274 and K032228 have an interesting, distinct geography, they are from the Indian ocean coast of KwaZulu-Natal and quite far from the Omicron outbreak home turf. image

rneher commented 2 years ago

or you align like this: image

In that case, no mutation but a different deletion. Hard to tell.

(Sorry, just saw you posted the same, but there is not mutation, right?)

rambaut commented 2 years ago

If all of them had that deletion then it would be Omicron that has an L3674F mutation:

image

So options would be - independent, different deletions, or the shared deletion but an F/L change in one or other lineage

bioinforME commented 2 years ago

Regarding the geographic location, the traveller is from the Cape Province via Gauteng.

I haven't got a chance to look closer to your findings, but really appreciate all your inputs.

rambaut commented 2 years ago

Here are the mutations in the outlier lineage vs. B.1.1.529 (i.e., shared and unique):

gene mutation outlier-lineage B.1.1.529? notes
ORF1ab S135R y    
ORF1ab T842I Y    
ORF1ab K856R   Y  
nuc C3037T Y Y B.1
nuc T5386G Y
ORF1ab G1307S Y    
nuc C4321T Y    
ORF1ab SL2083I   Y  
ORF1ab A2710T   Y  
ORF1ab L3027F Y    
nuc A9424G Y    
ORF1ab T3090I Y    
ORF1ab T3255I Y Y  
nuc C10198T Y    
nuc G10447A Y    
ORF1ab P3395H Y Y  
ORF1ab L3674F Y
ORF1ab SGF3675del Y Y  
ORF1ab I3758V   Y  
nuc C12880T Y    
ORF1ab P4715L Y Y B.1
nuc C15240T   Y  
nuc C15714T Y    
ORF1ab R5716C Y    
ORF1ab I5967V Y Y  
ORF1ab T6564I Y    
S T19I Y    
S LPPA24S Y    
S A67V   Y  
S HV69del   Y  
S T95I   Y  
S G142D Y Y  
S VYY143del   Y  
S NL211I   Y  
S V213G Y    
S 215EPEins Y
S G339D Y Y  
S S371F Y    
S S371L   Y  
S S373P Y Y  
S S375F Y Y  
S T376A Y    
S D405N Y    
S R408S Y    
S K417N Y Y  
S N440K Y Y  
S G446S   Y  
S S477N Y Y  
S T478K Y Y  
S E484A Y Y  
S Q493R Y Y  
S G496S   Y  
S Q498R Y Y  
S N501Y Y Y  
S Y505H Y Y  
S T547K   Y  
S D614G Y Y B.1
S H655Y Y Y  
S N679K Y Y  
S P681H Y Y  
S N764K Y Y  
S D796Y Y Y  
S N856K   Y  
S Q954H Y Y  
S N969K Y Y  
S L981F   Y  
nuc C25000T Y Y  
ORF3a T223I  Y    
E T9I Y Y B.1.1.529 fixed?
M D3G   Y B.1.1.529 fixed?
M Q19E Y Y B.1.1.529 fixed?
M A63T Y Y  
nuc C26858T Y    
nuc A27259C Y Y  
ORF6 D61L Y   (GAT->CTC)
nuc C27807T Y Y  
N P13L Y Y B.1.1.529 fixed?
N ERS31del Y   B.1.1.529 fixed?
N RG203KR Y Y (B.1.1.1)
N S413R Y    
rambaut commented 2 years ago

Will push this to the Pango committee to decide how to deal with this. Closing the issue. Thanks @bioinforME for posting this.

silcn commented 2 years ago

Very interesting; this gives us more insight into the evolution of Omicron, and tracking its progress may hint as to which mutations are actually important to Omicron. It even tells us which order the two mutations at 371 came in!

If the chronic infection hypothesis is true, there's no reason to believe patient zero would only have transmitted to one other person, and this is exactly what transmission to a second person would look like. Something similar happened with B.1.638 which was extremely diverse yet appeared to only be a small cluster in Nelson Mandela Bay.