cov-lineages / constellations

Other
44 stars 17 forks source link

cB.1.623 excludes all but two B.1.623 representative sequences #24

Closed AngieHinrichs closed 3 years ago

AngieHinrichs commented 3 years ago

PHE was tracking B.1.623+E484K -- which, as far as I can tell, has been observed only once aside from the two British travelers (England/MILK-F9DBDB/2021|EPI_ISL_852237 and England/MILK-F9DB71/2021|EPI_ISL_852169) in the PHE report (where is it referred to as B.1.324.1). B.1.623 is a lot broader (UCSC proposed just B.1 + {S:S494P, S:N501Y, S:P681H, N:M234I}).

As @aretchless pointed out in cov-lineages/pangolin#305 the recent update to pangolin to override pangoLEARN/usher assignments with Scorpio caused most B.1.623 sequences to be assigned 'None' now.

I ran scorpio via pangolin on FASTA for the 739 representative sequences for B.1.623 in pango-designation/lineages.csv:

pangolin --no-temp --outdir B.1.623.reps B.1.623.reps.fa

Only 2 of the 739 reps were classified as cB.1.623:

grep True B.1.623.reps/VOC_report.scorpio.B.1.623-like_counts.csv
England/MILK-F9DB71/2021,0,9,0,0,2,1.000000,0.000000,True
England/MILK-F9DBDB/2021,0,9,0,0,2,1.000000,0.000000,True

The other 737 sequences have too many reference alleles, implying that cB.1.623 is too specific to the virus sequences from those two travelers:

grep False B.1.623.reps/VOC_report.scorpio.B.1.623-like_counts.csv | head
USA/NY-NYULH80/2021,5,3,1,0,0,0.333300,0.555600,False
Aruba/AW-RIVM-11402/2021,6,3,0,0,0,0.333300,0.666700,False
USA/GA-CDC-STM-000009083/2021,6,2,1,0,0,0.222200,0.666700,False
USA/WI-UW-3221/2021,6,2,1,0,0,0.222200,0.666700,False
USA/WI-UW-3271/2021,6,2,1,0,0,0.222200,0.666700,False
USA/NY-PRL-2021_02_15_02F04/2021,5,3,1,0,0,0.333300,0.555600,False
USA/NY-PRL-2021_02_10_02F11/2021,5,3,1,0,0,0.333300,0.555600,False
USA/NY-PRL-2021_02_08_05C09/2021,5,3,1,0,0,0.333300,0.555600,False
USA/NY-PRL-2021_02_08_05F04/2021,5,3,0,1,0,0.333300,0.555600,False
USA/NY-NYULH290/2021,5,3,1,0,0,0.333300,0.555600,False

I suspect the alt_count should be higher than 2 or 3 for those as well (i.e. mutations could be added as well). I will make a pull request that brings cB.1.623.json more in line with my understanding of pango-designation's B.1.623.

rambaut commented 3 years ago

Does B.1.623 need to be tracked as a VOC/VOI? The original concern was with the E484K - without that there is nothing to justify it being a VOC/VOI and thus perhaps remove the Scorpio definition file?

rambaut commented 3 years ago

Actually it looks like the Scorpio file for 'B.1.623' makes no sense with respect to what has be designated. Either the Scorpio file needs to be updated to have the actual mutations or be removed and just let Pangolin assign it.