cov-lineages / constellations

Other
43 stars 17 forks source link

Change mutations to cover 738 of 739 B.1.623 representatives instead of 2 of 739 #25

Closed AngieHinrichs closed 3 years ago

AngieHinrichs commented 3 years ago

This replaces most of the mutations in cB.1.623.json, so that instead of excluding most pango-designation B.1.623 representative sequences, it includes almost all of them.

It appears that the mutations were selected by working back towards root from two England/MILK-F9D* samples with E484K tracked as a PHE VUI, making cB.1.623.json very specific to those two sequences and their close relatives. Instead, if we work forward from B.1 on the path to the branch that covers B.1.623 representative sequences in pango-designation (naturally I used the UCSC tree, but it should work out similarly on COG-UK's), we get a more widely shared set of mutations.

If I run scorpio on B.1.623 representative sequences a la pangolin using the modified cB.1.623.json like this:

scorpio classify \
    -i B.1.623.reps/sequences.aln.fasta \
    -o B.1.623.scorpio.report \
    --output-counts \
    --constellationscB.1.623.json \
    --pangolin \
    --list-incompatible \
    --long &> B.1.623.scorpio.log

Then the two England/MILK-F9D* sequences still have 0 ref and 9 alts as before. 738 of the 739 representative sequences are classified as cB.1.623, with the only failure having a high ambiguous count:

grep MILK B.1.623.scorpio.report.B.1.623-like_counts.csv
England/MILK-F9DB71/2021,0,9,0,0,2,1.000000,0.000000,True
England/MILK-F9DBDB/2021,0,9,0,0,2,1.000000,0.000000,True

grep False /data/tmp/angie/B.1.623.scorpio.report.B.1.623-like_counts.csv
USA/WI-CDC-2-4195065/2021,1,2,5,1,1,0.222200,0.111100,False

This change will enable pangolin to resume assigning B.1.623. However, if it is more important for cB.1.623.json to match the PHE VUI than pango-designation B.1.623, then instead of this change, perhaps pangolin could avoid overriding PLEARN/PUSHER assignments of B.1.623.

Closes #24.

rambaut commented 3 years ago

Thanks for this Angie. Before merging - there is another alternative which is to remove the B.1.623 Scorpio file as it has not been designated a VOC/VOI

rmcolq commented 3 years ago

Thank you Angie for this definition file, but I think in this case Andrew's suggestion is a sensible one. I have removed the constellation.