Closed bioinforME closed 2 years ago
Hi. The reason it is not being detected as Omicron is that at 14 of the sites that define Omicron, this sequence has the allele of the reference genome. Running Scorpo on it:
scorpio haplotype -i sequences.aln.fasta --output-counts --append-genotypes -n 'Omicron (B.1.1.529-like)'
gives this:
scorpio.csv
query,ref_count,alt_count,ambig_count,other_count,support,conflict,nuc:C241T,orf1a:K856R,nuc:C3037T,nuc:T5386G,del:6513:3,orf1a:A2710T,orf1a:T3255I,orf1a:P3395H,del:11283:9,orf1a:I3758V,nuc:T13195C,orf1b:P314L,nuc:C15240T,orf1b:I1566V,spike:A67V,spike:T95I,spike:G339D,spike:S371L,spike:S373P,spike:K417N,spike:N440K,spike:G446S,spike:S477N,spike:T478K,spike:E484A,spike:Q493R,spike:G496S,spike:Q498R,spike:N501Y,spike:T547K,spike:D614G,spike:H655Y,spike:N679K,spike:P681H,spike:N764K,spike:D796Y,spike:N856K,spike:Q954H,spike:N969K,nuc:C25000T,e:T9I,m:D3G,m:Q19E,m:A63T,nuc:A27259C,nuc:C27807T,n:RG203KR
QLD2568,14,30,1,2,0.638300,0.297900,N,K,T,T,0,A,I,H,X,I,T,L,C,V,A,T,D,X,P,N,K,G,N,K,A,R,G,R,Y,T,G,Y,K,H,K,Y,N,H,K,T,I,D,E,T,C,T,KR
It would be useful to check the raw data and see if these calls are correct - in which case this may represent a divergent 'intermediate'.
Thanks for your reply rambaut. I manually went through all the Omicron alleles and found all the ref and alt calls are well supported with over 300 reads at each position and above 90% support for each very one of them. Also the additional mutations are well supported. See my manual record here.
I am sharing the BAM files below, if you want to have a close look. The sample has run through with different primer sets and Artic V4 sequence got the best coverage, while there are about three alleles need to check the Midnight BAM as there are dropouts in V4.
@bioinforME @corneliusroemer pointed this out to me. You sequence looks like Omicron after Spike position ~200, before that, it looks quite different. So it is not surprising that different classifiers give different results:
Given that SouthAfrica/CERI-KRISP-K032307/2021|EPI_ISL_6795834|2021-11-17
is very similar with the same breakpoint pattern, this could be a real recombinant. But this needs a closer look
If you look at the Scorpio output, it is flipping back and forth between the Omicron defining alleles and reference:
mutation | allele | call |
---|---|---|
nuc:C241T | N | [ref] |
orf1a:K856R | K | [ref] |
nuc:C3037T | T | |
nuc:T5386G | T | [ref] |
del:6513:3 | 0 | [ref] |
orf1a:A2710T | A | [ref] |
orf1a:T3255I | I | |
orf1a:P3395H | H | |
del:11283:9 | X | |
orf1a:I3758V | I | [ref] |
nuc:T13195C | T | [ref] |
orf1b:P314L | L | |
nuc:C15240T | C | [ref] |
orf1b:I1566V | V | |
spike:A67V | A | [ref] |
spike:T95I | T | [ref] |
spike:G339D | D | |
spike:S371L | X | |
spike:S373P | P | |
spike:K417N | N | |
spike:N440K | K | |
spike:G446S | G | [ref] |
spike:S477N | N | |
spike:T478K | K | |
spike:E484A | A | |
spike:Q493R | R | |
spike:G496S | G | [ref] |
spike:Q498R | R | |
spike:N501Y | Y | |
spike:T547K | T | [ref] |
spike:D614G | G | |
spike:H655Y | Y | |
spike:N679K | K | |
spike:P681H | H | |
spike:N764K | K | |
spike:D796Y | Y | |
spike:N856K | N | [ref] |
spike:Q954H | H | |
spike:N969K | K | |
nuc:C25000T | T | |
e:T9I | I | |
m:D3G | D | [ref] |
m:Q19E | E | |
m:A63T | T | |
nuc:A27259C | C | |
nuc:C27807T | T | |
n:RG203KR | KR |
So it is either an intermediate - that hadn't acquired all of the defining mutations (i.e., is part of a larger pool of diversity), or it the consensus of a mixture of 2 lineages.
or it the consensus of a mixture of 2 lineages.
don't they see allele depths of ~ 300 to zero at the nucleotide positions where the expected Omicron matches were REF instead?
if it is a lineage mix, then one can only get such a clean allele depth pattern if the PCR is very allele-biased. But they resequenced with different primers and still missed the expected Omicron alleles, right?
Yes - I haven't looked at the BAM files but that is what @bioinforME states above. I also doubt that amplicons could ever come come up so cleanly.
So it definitely looks like a genome that is circulating in RSA - it groups with a recently submitted outgroup that we were looking into:
We will look into how we adjust the definition of B.1.1.529
to accommodate this.
These two groups have a number of interesting differences. Omicron has L3674-, S3675-, G3676-
, while this group has S3675-, G3676-, F3677-
like alpha, beta, gamma. There are a few more such differences.
Yes - I am just characterising the full set. Will post here shortly. One other thing to note is it doesn't have 69/70del so will not give S gene target failure
Actually it does seem to have L3674-, S3675-, G3676- it just has a T->G that is confusing the alignment:
The default minimap2 alignment is this:
This mutation results in F3677L but it seems much more parsimonious that the deletion happened once followed by the mutation
@bioinforME the sequence you uploaded is almost identical to hCoV-19/South Africa/CERI-KRISP-K032307/2021|EPI_ISL_6795834|2021-11-17
up to some extra N
s
Where's yours from? It's strange that two sequences would be near identical if it's an artefact.
it groups with a recently submitted outgroup that we were looking into:
the two previous outliers K032274 and K032228 have an interesting, distinct geography, they are from the Indian ocean coast of KwaZulu-Natal and quite far from the Omicron outbreak home turf.
or you align like this:
In that case, no mutation but a different deletion. Hard to tell.
(Sorry, just saw you posted the same, but there is not mutation, right?)
If all of them had that deletion then it would be Omicron that has an L3674F mutation:
So options would be - independent, different deletions, or the shared deletion but an F/L change in one or other lineage
Regarding the geographic location, the traveller is from the Cape Province via Gauteng.
I haven't got a chance to look closer to your findings, but really appreciate all your inputs.
Here are the mutations in the outlier lineage vs. B.1.1.529 (i.e., shared and unique):
gene | mutation | outlier-lineage | B.1.1.529? | notes |
---|---|---|---|---|
ORF1ab | S135R | y | ||
ORF1ab | T842I | Y | ||
ORF1ab | K856R | Y | ||
nuc | C3037T | Y | Y | B.1 |
nuc | T5386G | Y | ||
ORF1ab | G1307S | Y | ||
nuc | C4321T | Y | ||
ORF1ab | SL2083I | Y | ||
ORF1ab | A2710T | Y | ||
ORF1ab | L3027F | Y | ||
nuc | A9424G | Y | ||
ORF1ab | T3090I | Y | ||
ORF1ab | T3255I | Y | Y | |
nuc | C10198T | Y | ||
nuc | G10447A | Y | ||
ORF1ab | P3395H | Y | Y | |
ORF1ab | L3674F | Y | ||
ORF1ab | SGF3675del | Y | Y | |
ORF1ab | I3758V | Y | ||
nuc | C12880T | Y | ||
ORF1ab | P4715L | Y | Y | B.1 |
nuc | C15240T | Y | ||
nuc | C15714T | Y | ||
ORF1ab | R5716C | Y | ||
ORF1ab | I5967V | Y | Y | |
ORF1ab | T6564I | Y | ||
S | T19I | Y | ||
S | LPPA24S | Y | ||
S | A67V | Y | ||
S | HV69del | Y | ||
S | T95I | Y | ||
S | G142D | Y | Y | |
S | VYY143del | Y | ||
S | NL211I | Y | ||
S | V213G | Y | ||
S | 215EPEins | Y | ||
S | G339D | Y | Y | |
S | S371F | Y | ||
S | S371L | Y | ||
S | S373P | Y | Y | |
S | S375F | Y | Y | |
S | T376A | Y | ||
S | D405N | Y | ||
S | R408S | Y | ||
S | K417N | Y | Y | |
S | N440K | Y | Y | |
S | G446S | Y | ||
S | S477N | Y | Y | |
S | T478K | Y | Y | |
S | E484A | Y | Y | |
S | Q493R | Y | Y | |
S | G496S | Y | ||
S | Q498R | Y | Y | |
S | N501Y | Y | Y | |
S | Y505H | Y | Y | |
S | T547K | Y | ||
S | D614G | Y | Y | B.1 |
S | H655Y | Y | Y | |
S | N679K | Y | Y | |
S | P681H | Y | Y | |
S | N764K | Y | Y | |
S | D796Y | Y | Y | |
S | N856K | Y | ||
S | Q954H | Y | Y | |
S | N969K | Y | Y | |
S | L981F | Y | ||
nuc | C25000T | Y | Y | |
ORF3a | T223I | Y | ||
E | T9I | Y | Y | B.1.1.529 fixed? |
M | D3G | Y | B.1.1.529 fixed? | |
M | Q19E | Y | Y | B.1.1.529 fixed? |
M | A63T | Y | Y | |
nuc | C26858T | Y | ||
nuc | A27259C | Y | Y | |
ORF6 | D61L | Y | (GAT->CTC) | |
nuc | C27807T | Y | Y | |
N | P13L | Y | Y | B.1.1.529 fixed? |
N | ERS31del | Y | B.1.1.529 fixed? | |
N | RG203KR | Y | Y | (B.1.1.1) |
N | S413R | Y |
Will push this to the Pango committee to decide how to deal with this. Closing the issue. Thanks @bioinforME for posting this.
Very interesting; this gives us more insight into the evolution of Omicron, and tracking its progress may hint as to which mutations are actually important to Omicron. It even tells us which order the two mutations at 371 came in!
If the chronic infection hypothesis is true, there's no reason to believe patient zero would only have transmitted to one other person, and this is exactly what transmission to a second person would look like. Something similar happened with B.1.638 which was extremely diverse yet appeared to only be a small cluster in Nelson Mandela Bay.
This sequence we got this morning is from a return traveller from South Africa.
Pangolin failed to assign any lineage to it, neither scorpio.
On UShER tree it is close to one sequence from South Africa "SouthAfrica/CERI-KRISP-K032307/2021|EPI_ISL_6795834|2021-11-17", which is assigned to B.1.1.37 by Pangolin, but it is assigned to B.1.1.529 by UShER. They sit next to the Omicron clade but they are quite divergent. SNP distance scan of over 5000 Africa sequences with a collection date after 1 September 2021 downloaded from GISAID did not find anything else close to this sequence.
UShER tree
fasta sequence available here
Closest Sequences from Africa