cov-lineages / pangolin

Software package for assigning SARS-CoV-2 genome sequences to global lineages.
GNU General Public License v3.0
427 stars 107 forks source link

usher mode: run scorpio, report conflicts, but don't override usher assignment #462

Closed AngieHinrichs closed 2 years ago

AngieHinrichs commented 2 years ago

@aineniamh I've been thinking more about usher and scorpio -- I think it's still valuable to know whether scorpio agrees with the usher call, and in case there's some future situation in which we find scorpio is doing better than usher, it would be nice to have something to grep for in the note column for a workaround.

So how about this: in usher mode, we still run scorpio (unless --skip-scorpio is given), and still report conflicts between scorpio and usher in the note column, but don't override the usher call. ?

To test this, I ran the modified pangolin on GISAID sequences with EPIISL IDs 7000000-7009999 (Delta, to get some examples of incompatible lineage calls) and 13000000-13009999 (recent Omicron, to get some examples of BA.4, BA.5 and recombinant calls that were overridden due to being called Omicron/Unassigned or BA.2 by scorpio). Then I looked at the placement in the big UShER tree of sequences that had disagreements (all of the "conflicts with", not all of the many Omicron/Unassigned), to make sure that the mutations in the sequences looked reasonable for their usher assignments. Some example disagreements:

taxon lineage scorpio_call note Angie_note
Brazil/SP-NVBS6685GENOV828072411971/2021 EPI_ISL_7000478 2021-09-02 B.1.617.2 Delta (B.1.617.2-like) scorpio called lineage B.1.617.2 qc failed, usher was not run
Wales/PHWC-PFFCPT/2021 EPI_ISL_7000539 2021-11-06 AY.5 Delta (AY.4-like) Usher placements: AY.5(1/1); scorpio lineage AY.4 conflicts with inference lineage AY.5 AY.5 branch that acquired C7851T
Wales/PHWC-PFGDSF/2021 EPI_ISL_7003942 2021-11-05 AY.4 Delta (B.1.617.2-like) Usher placements: AY.4(1/1); scorpio lineage B.1.617.2 conflicts with inference lineage AY.4 (incompatible) placed in AY.4, very similar to other seqs assigned AY.4
Wales/PHWC-PFGHOA/2021 EPI_ISL_7004349 2021-11-05 AY.4.2 Delta (B.1.617.2-like) Usher placements: AY.4.2(1/1); scorpio lineage B.1.617.2 conflicts with inference lineage AY.4.2 (incompatible) placed in a weird B.1.617.2 offshoot in big tree; has AY.4.2 muts at pos >= 21995 but not 7851
Germany/SH-RKI-I-822454/2022 EPI_ISL_13000071 2022-05-16 BA.5 Omicron (Unassigned) Usher placements: BA.5(1/1); scorpio found insufficient support to assign a specific lineage BA.5
Germany/SH-RKI-I-822442/2022 EPI_ISL_13000051 2022-05-16 XM Omicron (Unassigned) Usher placements: XM(1/1); scorpio found insufficient support to assign a specific lineage XM
Germany/BY-RKI-I-822748/2022 EPI_ISL_13000620 2022-05-13 XW Omicron (Unassigned) Usher placements: XW(1/1); scorpio found insufficient support to assign a specific lineage XW
England/PHEC-YYFMG7R/2022 EPI_ISL_13000721 2022-05-12 BA.2 Omicron (Unassigned) Usher placements: BA.2(6/8) BA.2.25(1/8) BA.2.9(1/8); scorpio found insufficient support to assign a specific lineage excluded from big tree because PHEC-YYFMG7R not in cog_all.metadata.csv => qc
Germany/SH-RKI-I-822918/2022 EPI_ISL_13000915 2022-05-17 BA.5 Omicron (BA.2-like) Usher placements: BA.5(1/1); scorpio lineage BA.2 conflicts with inference lineage BA.5 BA.5
Germany/SH-RKI-I-822514/2022 EPI_ISL_13000158 2022-05-16 BA.4 Omicron (BA.2-like) Usher placements: BA.4(2/2); scorpio lineage BA.2 conflicts with inference lineage BA.4 BA.4
Indonesia/JB-BHL-ITB-N116/2022 EPI_ISL_13008157 2022-04-05 BA.2.3 Omicron (BA.3-like) Usher placements: BA.2.3(13/13); scorpio lineage BA.3 conflicts with inference lineage BA.2.3 BA.2.3
Indonesia/JB-BHL-ITB-N127/2022 EPI_ISL_13008162 2022-03-23 BA.2 Omicron (BA.3-like) Usher placements: BA.2(24/39) BA.2.15(5/39) BA.2.23(2/39) BA.2.31(8/39); scorpio lineage BA.3 conflicts with inference lineage BA.2 excluded from big tree because too many equally optimal placements