MDU-PHL / pango-collapse

app to collapse Pango lineages for reporting
https://mdu-phl.github.io/pango-collapse/
GNU General Public License v3.0
10 stars 1 forks source link

Collapsing recombinants #7

Closed ashkan98 closed 11 months ago

ashkan98 commented 11 months ago

Hi @Wytamma!

I want to use your package for my project and I'm curious why it's not possible to trace recombinants (e.g. XBB.1.5) back to the omicron parental lineage (B.1.1.529). Although I removed the recombinants from the collapse file, I get the same input lineage just with the add of "Recombinant" as lineage familly.

I assume that the information about the parent of a sublineage comes from the alias_key.json in the repo but is it up to date according to the pangolin annotation?

Thank you very much.

Wytamma commented 11 months ago

Hi @ashkan98

Currently pango-collapse treats recombinants as separate lineages because there's not a clear way to collapse the multiple lineages.

For example if you have BA.2.10 and B.1.1.529 in the collapse file how should you collapse XBB? This gets more complicated for more diverse recombinants.

           BM.1.1.1 -> B.1.1.529
         /
   XBB -
         \
           BJ.1 -> BA.2.10

We could potentially report both lineages? But that means you'd have to deal with multiple values in a single column in the output file. I could add this behaviour and make it configurable via a cli flag like --split-recombinants (I'll open a feature request for this).

Recombinant is a special case in the collapse file that collapses all recombinants to Recombinant. If you remove Recombinant from the collapse file then recombinants like XBB will only be collapse up to their X lineage.

For now If you want to collapse XBB to B.1.1.529 you could do that with post processing e.g.

Add B.1.1.529 and XBB to your collapse file and run pango-collapse.

import pandas as pd
df = pd.read_csv("nextclade_collapsed.tsv", sep="\t")
df.Lineage_family.replace("XBB", "B.1.1.529", inplace=True)  # combine XBB and B.1.1.529
Wytamma commented 11 months ago

Oh and if you don't supply an alias_key.json file with the --alias-file flag then the alias_key.json file will be downloaded from https://raw.githubusercontent.com/cov-lineages/pango-designation/master/pango_designation/alias_key.json every time you run pango-collapse i.e. it always uses the latest file.

ashkan98 commented 11 months ago

Thank you very much, thats makes obviously sence! For my usecase its fine to leave recombinants at XBB for now :) And thank you for sending the source to the json file thats want i was asking for!