cov-lineages / constellations

Other
43 stars 17 forks source link

Updated AY.4.2 definition file to better match agreed VTG definition #33

Closed rambaut closed 2 years ago

AngieHinrichs commented 2 years ago

These changes seem likely to cause Scorpio to assign AY.4.2 more broadly, which might not be a bad thing entirely (e.g. cov-lineages/scorpio#32) but I think it would be good to proceed with caution & perhaps evaluate the effect of these changes on Delta sequences before merging. The concern about AY.4.2 has been about its growth, and if a change adds false positives, unfortunately those would probably be misinterpreted as more growth.

AngieHinrichs commented 2 years ago

I did a quick evaluation and TL;DR the proposed changes seem fine, they do not seem to result in false positives AY.4 --> AY.4.2.

I extracted & aligned to reference 23016 sequences in the UCSC/UShER tree's AY.4.2 branch, which I would expect to be mostly if not all AY.4.2, and 50000 sequences randomly selected from the B.1.617.2 branch excluding the AY.4.2 branch, which I would expect to be mostly B.1.617.2 or AY.4, not AY.4.2. I ran scorpio first with the current main branch constellations files, and then with the dev branch constellations files with the proposed changes.

23016 sequences that look like AY.4.2 according to UShER placement:

Scorpio constellation count (current version) count (proposed changes)
Delta (AY.4.2-like) 22293 22302
Delta (AY.4-like) 411 402
Delta (B.1.617.2-like) 311 311
(empty string) 1 1

-- Among expected-AY.4.2 sequences, only 13 sequences' assignments changed (11 AY.4 --> AY.4.2, 2 AY.4.2 --> AY.4; those 2 have C25614T and I guess it was just pushing them over the threshold) and a net positive for AY.4.2 seems like a good thing there.

50000 sequences that look like non-AY.4.2 Delta according to UShER placement:

Scorpio constellation count (current version) count (proposed changes)
Delta (B.1.617.2-like) 37342 37342
Delta (AY.4-like) 12382 12382
Delta (B.1.617.2-like) +K417N 133 133
B.1.617.1-like 2 2
Delta (AY.4.2-like) 1 1
(empty string) 140 140

-- so no change in assignments at all. The only change in the report file was to the details for the one sample assigned to AY.4.2 (alt count decreased by 1, rules by 2).

< Scotland/QEUH-272D370/2021|EPI_ISL_5360859|2021-10-12,Delta (AY.4.2-like),AY.4.2,,0,51,0,0,9,1.000000,0.000000,Delta (AY.4.2-like)
---
> Scotland/QEUH-272D370/2021|EPI_ISL_5360859|2021-10-12,Delta (AY.4.2-like),AY.4.2,,0,50,0,0,7,1.000000,0.000000,Delta (AY.4.2-like)

That one is an interesting case -- it's very close to AY.4.2 in the UCSC/UShER tree, but is on a branch with other sequences from Scotland with which it shares several mutations; those sequences don't have Y145H [I haven't yet checked whether they have N there], but Scotland/QEUH-272D370/2021 does. Here's a view colored by UShER's lineage annotations; Scotland/QEUH-272D370/2021 is the yellow dot, with green not-quite-AY.4.2 sequences, adjacent to the Y145H node that is annotated as the start of AY.4.2 in our tree: https://nextstrain.org/fetch/hgwdev.gi.ucsc.edu/~angie/constellations-33.json?branchLabel=Spike%20mutations&c=pango_lineage_usher&label=nuc%20mutations:G210T,C241T,C3037T,G4181T,C6402T,C7124T,C7851T,C8986T,G9053T,C10029T,A11201G,A11332G,C14408T,G15451A,C16466T,T17040C,C19220T,C21618G,C21846T,G22030A,T22031A,T22032G,C22227T,T22917G,C22995A,A23403G,C23604G,G24410A,C25469T,C25614T,T26767C,T27638C,C27752T,C27874T,A28461G,G28881T,G28916T,G29402T,G29742T

scorpio_report files:

rmcolq commented 2 years ago

This is very helpful - thanks @AngieHinrichs for the careful checks!

rmcolq commented 2 years ago

With the updated/current dev definitions, the counts for usher-AY.4.2 are now (1 empty string plus):

 457     AY.4
22408    AY.4.2
 150     B.1.617.2

ie and improvement with more assigned to AY.4.2, and more to AY.4 and fewer to B.1.617.2

For the non-AY.4.2 subset, there has been a move of some assignments from B.1.617.2 (not +K417N) to AY.4, but no other changes, with new counts for those constellations at:

12615 Delta (AY.4-like)
37109 Delta (B.1.617.2-like)
rmcolq commented 2 years ago

I pulled out a random subset of 20,000 usher-AY.4 and non-AY.4 deltas and ran scorpio. For the non-AY deltas, I got:

  83 
  40  Delta (AY.4-like)
17883 Delta (B.1.617.2-like)
  46  Delta (B.1.617.2-like) +K417N

which is same as before except for 2 Delta (B.1.617.2-like) -> AY.4. For the AY.4s

     1 
17974 Delta (AY.4-like)
   1    Delta (AY.4.2-like)
 240   Delta (B.1.617.2-like)
   4    Delta (B.1.617.2-like) +K417N

But there were previously 1194 Delta (B.1.617.2-like), so a large number of these wrong ones are now correctly assigned. So all in all, I think this is already an improvement