Closed rambaut closed 3 years ago
I did a quick evaluation and TL;DR the proposed changes seem fine, they do not seem to result in false positives AY.4 --> AY.4.2.
I extracted & aligned to reference 23016 sequences in the UCSC/UShER tree's AY.4.2 branch, which I would expect to be mostly if not all AY.4.2, and 50000 sequences randomly selected from the B.1.617.2 branch excluding the AY.4.2 branch, which I would expect to be mostly B.1.617.2 or AY.4, not AY.4.2. I ran scorpio first with the current main branch constellations files, and then with the dev branch constellations files with the proposed changes.
Scorpio constellation | count (current version) | count (proposed changes) |
---|---|---|
Delta (AY.4.2-like) | 22293 | 22302 |
Delta (AY.4-like) | 411 | 402 |
Delta (B.1.617.2-like) | 311 | 311 |
(empty string) | 1 | 1 |
-- Among expected-AY.4.2 sequences, only 13 sequences' assignments changed (11 AY.4 --> AY.4.2, 2 AY.4.2 --> AY.4; those 2 have C25614T and I guess it was just pushing them over the threshold) and a net positive for AY.4.2 seems like a good thing there.
Scorpio constellation | count (current version) | count (proposed changes) |
---|---|---|
Delta (B.1.617.2-like) | 37342 | 37342 |
Delta (AY.4-like) | 12382 | 12382 |
Delta (B.1.617.2-like) +K417N | 133 | 133 |
B.1.617.1-like | 2 | 2 |
Delta (AY.4.2-like) | 1 | 1 |
(empty string) | 140 | 140 |
-- so no change in assignments at all. The only change in the report file was to the details for the one sample assigned to AY.4.2 (alt count decreased by 1, rules by 2).
< Scotland/QEUH-272D370/2021|EPI_ISL_5360859|2021-10-12,Delta (AY.4.2-like),AY.4.2,,0,51,0,0,9,1.000000,0.000000,Delta (AY.4.2-like)
---
> Scotland/QEUH-272D370/2021|EPI_ISL_5360859|2021-10-12,Delta (AY.4.2-like),AY.4.2,,0,50,0,0,7,1.000000,0.000000,Delta (AY.4.2-like)
That one is an interesting case -- it's very close to AY.4.2 in the UCSC/UShER tree, but is on a branch with other sequences from Scotland with which it shares several mutations; those sequences don't have Y145H [I haven't yet checked whether they have N there], but Scotland/QEUH-272D370/2021 does. Here's a view colored by UShER's lineage annotations; Scotland/QEUH-272D370/2021 is the yellow dot, with green not-quite-AY.4.2 sequences, adjacent to the Y145H node that is annotated as the start of AY.4.2 in our tree: https://nextstrain.org/fetch/hgwdev.gi.ucsc.edu/~angie/constellations-33.json?branchLabel=Spike%20mutations&c=pango_lineage_usher&label=nuc%20mutations:G210T,C241T,C3037T,G4181T,C6402T,C7124T,C7851T,C8986T,G9053T,C10029T,A11201G,A11332G,C14408T,G15451A,C16466T,T17040C,C19220T,C21618G,C21846T,G22030A,T22031A,T22032G,C22227T,T22917G,C22995A,A23403G,C23604G,G24410A,C25469T,C25614T,T26767C,T27638C,C27752T,C27874T,A28461G,G28881T,G28916T,G29402T,G29742T
scorpio_report files:
This is very helpful - thanks @AngieHinrichs for the careful checks!
With the updated/current dev definitions, the counts for usher-AY.4.2 are now (1 empty string plus):
457 AY.4
22408 AY.4.2
150 B.1.617.2
ie and improvement with more assigned to AY.4.2, and more to AY.4 and fewer to B.1.617.2
For the non-AY.4.2 subset, there has been a move of some assignments from B.1.617.2 (not +K417N) to AY.4, but no other changes, with new counts for those constellations at:
12615 Delta (AY.4-like)
37109 Delta (B.1.617.2-like)
I pulled out a random subset of 20,000 usher-AY.4 and non-AY.4 deltas and ran scorpio. For the non-AY deltas, I got:
83
40 Delta (AY.4-like)
17883 Delta (B.1.617.2-like)
46 Delta (B.1.617.2-like) +K417N
which is same as before except for 2 Delta (B.1.617.2-like) -> AY.4. For the AY.4s
1
17974 Delta (AY.4-like)
1 Delta (AY.4.2-like)
240 Delta (B.1.617.2-like)
4 Delta (B.1.617.2-like) +K417N
But there were previously 1194 Delta (B.1.617.2-like), so a large number of these wrong ones are now correctly assigned. So all in all, I think this is already an improvement
These changes seem likely to cause Scorpio to assign AY.4.2 more broadly, which might not be a bad thing entirely (e.g. cov-lineages/scorpio#32) but I think it would be good to proceed with caution & perhaps evaluate the effect of these changes on Delta sequences before merging. The concern about AY.4.2 has been about its growth, and if a change adds false positives, unfortunately those would probably be misinterpreted as more growth.