Open KatSteinke opened 8 months ago
There was a comment on the release of 1.23: ** NOTE: the v1.23 tree provokes a corner-case bug in usher-sampled prior to version 0.6.3 that causes some lineage A samples to be assigned to A. sublineages or even B or B.* sublineages. If you will be running pangolin on early 2020 sequences that may be lineage A, then it is highly recommended to use the assignment cache (install by running pangolin --add-assignment-cache, run pangolin on input sequences with --use-assignment-cache) and to update the usher package in your pangolin environment to 0.6.3 as soon as it is released. Are you using the assignment cache mode?
We’re not - we don’t have any A lineages among our control samples, and I‘d understood the instructions in the notes as a workaround until Usher 0.6.3 was available and thus assumed it wasn’t relevant now that version was out. I’ll try and see how it looks with assignment cache mode as soon as I can.
Perhaps @AngieHinrichs can clarify if this is the problem still?
The issue seems to persist with --add-assignment cache
followed by running with --use-assignment-cache
.
pangolin /path/to/positive_control.consensus.fasta --outfile /path/to/pangolin-assignment.csv --threads 6 --analysis-mode usher --use-assignment-cache
in a fresh conda env with the specs given above results in the following output: | taxon | lineage | conflict | ambiguity_score | scorpio_call | scorpio_support | scorpio_conflict | scorpio_notes | version | pangolin_version | scorpio_version | constellation_version | is_designated | qc_status | qc_notes | note |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
positive_control | B.1 | 0.0 | PUSHER-v1.23.1 | 4.3 | 0.3.17 | v0.1.12 | False | pass | Ambiguous_content:0.02 | Usher placements: B.1(1/1) |
Thanks for reporting this. I will fix it in the next release.
Due to a recent shuffling around of the order in which mutations are annotated on successive branches, B.1.118 is annotated on a small branch within the larger branch where it should be annotated, with two extra mutations, one of which is absent from most samples. In previous versions, although B.1.118 was annotated on a branch that had the two extra mutations, the extra mutations were placed on a larger branch, and then one was reverted to reference on a sub-branch that covered most of the samples. The effect was that in the previous version, B.1.118 samples without the extra mutation(s) would be placed on the branch where B.1.118 was annotated (with a reversion on the mutation that shouldn't have been in the path in the first place), but now, with an arguably better structure / order of mutations, B.1.118 is annotated on a sub-branch and I need to fix the annotation.
With version 1.23.1, one of our positive controls which has been consistently called as B.1.118 suddenly gets called as B.1. We're running pangolin 4.3 in usher placement mode, relevant versions are
Given it's a positive control I should be able to share the sequence if needed, but it looks like this might be a general issue with B.1.118 sequences - UCSC UShER gives the same results for a bunch of B.1.118 genomes from GISAID, while COG-UK (still on 1.22) gives B.1.118 - kudos to Ammar Aziz over on the µbioinfo slack for digging into it.