Closed ghost closed 1 year ago
It seems (1) is "by design":
https://github.com/MDU-PHL/pango-watch/blob/main/app.py
if lineage.startswith('X'): # remove recombinants
continue
So probably the solution is documenting it?
2) In 3edf96d readme lists
2022-12-03
+ [BQ.1.1.23](https://cov-lineages.org/lineage.html?lineage=BQ.1.1.23)
but AFAIK BQ.1.1.23 was introduced early November, and since also BQ.1.1.24, BQ.1.1.25, BQ.1.1.26 and the new one is only BQ.1.1.27?
https://github.com/cov-lineages/pango-designation/commits/master/lineage_notes.txt
At least partially solved, recombinants are included in the tree since https://github.com/MDU-PHL/pango-watch/commit/a20648c7821fdb2bb55392243e8581040e6ff304
But trying as an example XBB: Per https://github.com/cov-lineages/pango-designation/blob/master/lineage_notes.txt it is "XBB Recombinant lineage of BJ.1 and BA.2.75 with breakpoint in S1, found in USA and Singapore, from issue #1058"
But:
I see it in the https://mdu-phl.github.io/pango-watch/tree/ with the links to BA.2.10.1 (?) (as a sibling to BJ.1 ?) and BM.1.1.1
and I see it in the data.json descending from BA.2.10.1, first other parent is correctly: B.1.1.529.2.75.3.1.1.1 (which has as an alias: BM.1.1.1): A part of an output of my program ( https://github.com/janko-js/variants_text_tree/blob/main/pretty-tree.pl ): B.1.1.529.2.10 (BA.2.10),
Its child list starts with [ B.1.1.529.2.10.1 (BA.2.10.1),
Its child list: [ B.1.1.529.2.10.1.1 (BJ.1), XBB <BA.2.10.1+BM.1.1.1>,
I.e the BJ.1 and XBB appear to be both children of BA.2.10.1. Then XBB has its child list
[ XBB.1,
etc.
Current output of
pretty-tree.pl | grep "+" | sort
XA <B.1.1+B.1.177>,
XAA <B.1.1.529+BA.2>,
XAB <B.1.1.529+BA.2>,
XAC <B.1.1.529+BA.1>,
XAD <B.1.1.529+BA.1>,
XAE <B.1.1.529+BA.1>,
XAF <B.1.1.529+BA.2>,
XAG <B.1.1.529+BA.2>,
XAH <B.1.1.529+BA.1>,
XAJ <BA.2.12+BA.4>,
XAK <B.1.1.529+BA.1>,
XAL <B.1.1.529+BA.2>,
XAM <BA.1+BA.2.9>,
XAN <B.1.1.529+BA.5.1>,
XAP <B.1.1.529+BA.1>,
XAQ <B.1.1.529+BA.2>,
XAR <B.1.1.529+BA.2>,
XAS <B.1.1.529+BA.2>,
XAT <BA.2.3+BA.1>,
XAU <BA.1+BA.2.9>,
XAV <B.1.1.529+BA.5>,
XAW <B.1.1.529+AY.122>,
XAY <B.1.617.2+BA.4>,
XAZ <BA.2+BA.5>,
XB <B.1+B.1.631>,
XBA <B.1.617.2+BA.4>,
XBB <BA.2.10.1+BM.1.1.1>,
XBC <B.1.1.529+B.1.617.2>,
XBD <BA.2.75+BF.5>,
XBE <BA.5+BE.4.1>,
XBF <BA.5.2+CJ.1>,
XBG <BA.2+BA.5.2>,
XBH <BA.2.3+BA.2.75.2>,
XBJ <BA.2.3+BA.5.2>,
XBK <BA.5+CJ.1>,
XBL <XBB+BA.2.75>,
XBM <BA.2+BF.3>,
XBN <BA.2+XBB.3>,
XBP <BA.2+BQ.1>,
XC <B.1.617.2+B.1.1.7>,
XD <B.1.617+BA.1>,
XE <B.1.1.529+BA.2>,
XF <B.1.617+BA.1>,
XG <B.1.1.529+BA.2>,
XH <B.1.1.529+BA.2>,
XJ <B.1.1.529+BA.2>,
XK <B.1.1.529+BA.2>,
XL <B.1.1.529+BA.2>,
XM <BA.1+BA.2>,
XN <B.1.1.529+BA.2>,
XP <BA.1+BA.2>,
XQ <BA.1+BA.2>,
XR <BA.1+BA.2>,
XS <B.1.617+BA.1.1>,
XT <B.1.1.529+BA.1>,
XU <B.1.1.529+BA.2>,
XV <B.1.1.529+BA.2>,
XW <B.1.1.529+BA.2>,
XY <B.1.1.529+BA.2>,
XZ <B.1.1.529+BA.1>,
Whereas: "XA Recombinant lineage with parental lineages B.1.1.7 and B.1.177" etc.
So it seems one of the ancestors of the recombinants is currently always wrong in data.json (e.g. for XBB: BA.2.10.1 instead of BJ.1 listed in the lineage_notes.txt).
Hey @janko-js! Sorry I missed this issue (GitHub’s notification system is terrible :/)! Thanks so much for pointing this out. I’ll be away for a few weeks but will fix this once I’m back. Will be happy to merge a PR if you have one :)
Hi @janko-js I think it's fixed now! Thanks for spotting. Please reopen if i missed something.
Sorry, it still appears wrong.
Three parents of XBL? XBB XBB.1 and BA.2.75
But only 2 (XBB.1 and BA.2.75) mentioned in:
https://github.com/cov-lineages/pango-designation/blob/master/lineage_notes.txt
"XBL Recombinant lineage of XBB.1 with S:F486P and BA.2.75, Malaysia, from issue #1532"
https://github.com/cov-lineages/pango-designation/issues/1532
https://github.com/ktmeaton/ncov-recombinant/issues/219
I've seen it as I've compared the output of my script
XBL <XBB+BA.2.75>,
with the line:
"XBL Recombinant lineage of XBB.1 with S:F486P and BA.2.75, Malaysia, from issue #1532"
My script always prints just the two parents, and in this case it extracted the XBB from your json.
Ah! Thanks for following up. I think that’s related to multiple XBB.1 in the key. I think a .unique() will fix it! Will try that now
Sorry, it's still wrong. You can use my script to generate the text tree from the local data.json and easily compare the info about the recombinants :
$ pretty-tree.pl | grep XBL
XBL <XBB+BA.2.75>,
vs.
"XBL Recombinant lineage of XBB.1 with S:F486P and BA.2.75, Malaysia, from issue #1532"
And also directly seeing the data.json, it's visible that XBL is a sibling to XBB.8 and not a child of XBB.1:
And the corresponding (equivalent) information to the pictured data.json part, after processing the data.json with my script: ("XBB.5 is a child of XBB, XBB.6 too but it has a child XBB.6.1, then XBB.7 and XBB.8 are childless, and XBL is (falsely) a child of XBB in that data.json (it should be a child of XBB.1 per https://github.com/ktmeaton/ncov-recombinant/issues/219 and lineage notes):
XBB <BM.1.1.1+BJ.1>,
...
XBB.5,
XBB.6,
[ XBB.6.1,
]
XBB.7, XBB.8,
XBL <XBB+BA.2.75>,
]
Interestingly
https://github.com/cov-lineages/pango-designation/blob/master/pango_designation/alias_key.json
for some reason has, for me unexpectedly, multiple entries for parents, but the parent is still XBB.1 and not XBB
"XBL": ["XBB.1","BA.2.75","XBB.1"],
(Tangentially: You can also compare the "before" and "after" of the text tree:
https://github.com/janko-js/variants_text_tree/commit/f022e23158d2e680d16cf64bd4dc0f4648e26888
Note that previously your first parent of XBB was BJ.1 and now you connected it to the BM.1.1.1 first, so the whole "subtree" appeared on another place in that representation. The similar swap happened with XBF (CJ.1+BA.5.2.3 now, BA.5.2.3+CJ.1 before), XBP, ... etc. I know that the both parents are of the same importance, it's just that not having an order but leaving it to the randomness makes the automatic comparisons unnecessarily harder as for the stored representation the first and the other parent have different appearance, one being implicit from the "tree", another being the attribute.)
Ah yes there are a few issues here...
The tree/data.json file is a hack, it is used to generate a D3js hierarchy. The D3js hierarchy has no concept of nodes with multiple parents (i.e. it is a tree not a graph). I had to hack the layout to add recombinants. So the data.json file wont make sense unless you process the otherParents
key correctly. I now generate graph/data.json which is an actual graph structure and so makes sense for recombinants.
The multiple parents in the alias_key are the break points i.e. the middle of XBL is BA.2.75.
I can fix the ordering by sorting the parents list so the the recombinant is always first. This may change what it's like now but will at least be consistent from now on.
Thanks for persisting @janko-js
I think I've fixed it now... but will leave it up to your keen eyes @janko-js (3ab84c8). I've checked and XBL is XBB.1 and BA.2.75. I would use the graph/data.json as the lineages make more sense as a graph when recombinants are included.
I have an impression it's still open, sorry:
Running my script on your json I get:
XBB <BM.1.1.1+BJ.1>,
BM.1.1.1?
but lineage_notes.txt:
XBB Recombinant lineage of BJ.1 and BA.2.75 with breakpoint in S1, found in USA and Singapore, from issue #1058
i.e. I'd expect BA.2.75 to be there, not BM.1.1.1 ?
I'd also like if you manage to insert the recombinants in the tree always via the longer path (i.e. to the parent which has the most 'dots' in the full name, and internally to track the "number of dots" ("the longest path") even for the recombinants of the recombinants). That would guarantee the ultimate consistency and would also give some clarity about the minimal "naming distance" of every recombinant. I haven't checked if you're already doing that, as the first step is to have the parents which match lineage_notes.txt I'm sorry for asking that, but I believe it is actually giving more meaning to the "tree". I understand you like graph, but to me the consistent tree can say more about the history of the recognition of the pango subvariants, maybe you'll like the idea too.
Hmm 🤔 looks like you might have found a bug in pango-designation as the alias_key list "XBB": ["BJ.1","BM.1.1.1"], https://github.com/cov-lineages/pango-designation/blob/7f20135411fec880f89a5571f2a4656bb29d5f12/pango_designation/alias_key.json#L174. Maybe worth opening an issue with them?
I will try to sort out the ordering as you state above. Thanks again!
How to compare the lineage notes and the tree data.json (recomb-compared.txt is produced):
https://gist.github.com/janko-js/3eb2ea9a7e504a27d24e219d3dafa993
Awesome 🙏
Thanks! I also think them mentioning two times the same parent is also an issue, and for your program you should keep preserving only the "unique" parents.
It seems, seeing their issue https://github.com/cov-lineages/pango-designation/issues/1058 it's indeed BM.1.1.1 so for that specific case it's lineage_notes.txt not updated.
And I think they also should not mention the same parent twice, haven't investigated why that's there.
I think the double up in the parents are the break points of the recombinant
So it seems the parents match now! Congratulations.
Regarding the ordering I've suggested: I believe it would result in the "tree" with the branches growing "as far as possible" and in a way consistent for the recombinants: the "later" contributing parents would always be from the same level or earlier, and the first would consistently place the variant at least as far away from the "naming root" as the variant with the "most dots" in the longest path allows. I never thought about that solution until I've played with the tree you produce, but I think it has sense. Thanks for all your work!
The comparison using the alias_key.json
https://gist.github.com/janko-js/6ac001b4a2862d3d4cb8e420a8d5c7cb
Processing with it both alias_key.json and tree/data.json processed with pretty-tree.pl produce at the end the same output (the lines are modified to be split into the relevant "words" which are then made unique, sorted backwards in every line and printed in the same manner):
XA B.1.177 B.1.1.7
XAA BA.2 BA.1
XAB BA.2 BA.1
XAC BA.2 BA.1
XAD BA.2 BA.1
XAE BA.2 BA.1
XAF BA.2 BA.1
XAG BA.2 BA.1
XAH BA.2 BA.1
XAJ BA.4 BA.2.12.1
XAK BA.2 BA.1
XAL BA.2 BA.1
XAM BA.2.9 BA.1.1
XAN BA.5.1 BA.2
XAP BA.2 BA.1
XAQ BA.2 BA.1
XAR BA.2 BA.1
XAS BA.5 BA.2
XAT BA.2.3.13 BA.1
XAU BA.2.9 BA.1.1
XAV BA.5 BA.2
XAW BA.2 AY.122
XAY BA.2 AY.45
XAZ BA.5 BA.2.5
XB B.1.634 B.1.631
XBA BA.2 AY.45
XBB BM.1.1.1 BJ.1
XBC BA.2 B.1.617.2
XBD BF.5 BA.2.75.2
XBE BE.4.1 BA.5.2
XBF CJ.1 BA.5.2.3
XBG BA.5.2 BA.2.76
XBH BA.2.75.2 BA.2.3.17
XBJ BA.5.2 BA.2.3.20
XBK CJ.1 BA.5.2
XBL XBB.1 BA.2.75
XBM BF.3 BA.2.76
XBN XBB.3 BA.2.75
XBP BQ.1 BA.2.75
XBQ CJ.1 BA.5.2
XBR BQ.1 BA.2.75
XBS BQ.1 BA.2.75
XBT BA.5.2.34 BA.2.75
XC B.1.1.7 AY.29
XD BA.1 B.1.617.2
XE BA.2 BA.1
XF BA.1 B.1.617.2
XG BA.2 BA.1
XH BA.2 BA.1
XJ BA.2 BA.1
XK BA.2 BA.1
XL BA.2 BA.1
XM BA.2 BA.1.1
XN BA.2 BA.1
XP BA.2 BA.1.1
XQ BA.2 BA.1.1
XR BA.2 BA.1.1
XS BA.1.1 B.1.617.2
XT BA.2 BA.1
XU BA.2 BA.1
XV BA.2 BA.1
XW BA.2 BA.1
XY BA.2 BA.1
XZ BA.2 BA.1
So I'm quite sure the information in both matches now.
I think this issue can be closed. If some new inconsistency occurs, a new issue can be opened, at the moment the tree appears consistent.
Excellent! Cheers @janko-js
1) the readme has recombinants, the tree (data.json) doesn't
I've recognized first time seeing that readme has XBG but searching through data.json I don't find it.
More obviously: In commit 36191b4 there's
2022-11-17
And the tree remains unchanged, last time changed day before in commit 0524792.