cov-lineages / constellations

BA.1-like vs BA.2 #46

hsnguyen commented 2 years ago

Hi team, We have 2 sequences from the same sample (SNP distance = 0) : hCoV-19/Australia/QLD2584/2021 and hCoV-19/Australia/QLD2568/2021 but got assigned to different sub-lineage of Omicron using the newest constellations.

taxon lineage conflict ambiguity_score scorpio_call scorpio_support scorpio_conflict version pangolin_version pangoLEARN_version pango_version status note
hCoV-19/Australia/QLD2584/2021 BA.1 0.0 0.9397562119081107 Probable Omicron (BA.1-like) 0.517200 0.258600 PLEARN-v1.2.101 3.1.17 2021-11-25 v1.2.101 passed_qc scorpio call: Alt alleles 30; Ref alleles 15; Amb alleles 10; Oth alleles 3; scorpio replaced lineage assignment AZ.2
hCoV-19/Australia/QLD2568/2021 BA.2 0.0 1.0 BA.2-like 0.984800 0.015200 PLEARN-v1.2.101 3.1.17 2021-11-25 v1.2.101 passed_qc scorpio call: Alt alleles 65; Ref alleles 1; Amb alleles 0; Oth alleles 0; scorpio replaced lineage assignment AZ.2

The only difference between 2 sequences is that QLD2568 has better quality than QLD2584. If I run scorpio haplotype against BA.2 constellation, QLD2568 is a perfect match while QLD2584 has only 1 ambiguous allele there.

query ref_count alt_count ambig_count other_count support conflict orf1ab:S135R orf1ab:T842I orf1ab:G1307S nuc:C4321T orf1ab:L3027F nuc:A9424G orf1ab:T3090I orf1ab:L3201F nuc:C10198T nuc:G10447A nuc:C12880T nuc:C15714T nuc:C15714T orf1ab:R5716C orf1ab:T6564I nuc:A20055G spike:T19I del:21633:9 nuc:T22200G spike:S371F spike:T376A spike:D405N spike:R408S nuc:C26060T nuc:C26858T orf6:D61L n:S413R
hCoV-19/Australia/QLD2584/2021 0 26 1 0 0.963000 0.000000 R I S T F G I F N A T T T C I G I 3 G F A N S T T L R
hCoV-19/Australia/QLD2568/2021 0 27 0 0 1.000000 0.000000 R I S T F G I F T A T T T C I G I 3 G F A N S T T L R

Please find attached FASTA file for your convenience (QLD2584 has been removed from GISAID due to duplicaiton)

We call it BA.2 but just want let you know the issue. Thanks,

rmcolq commented 2 years ago

Looks like the threshold for number of ambiguities allowed were too strict to allow this sequence through. I agree this is undesirable behaviour and will update the constellation to be more flexible