Closed ashkan98 closed 11 months ago
Hi @ashkan98
Currently pango-collapse
treats recombinants as separate lineages because there's not a clear way to collapse the multiple lineages.
For example if you have BA.2.10 and B.1.1.529 in the collapse file how should you collapse XBB? This gets more complicated for more diverse recombinants.
BM.1.1.1 -> B.1.1.529
/
XBB -
\
BJ.1 -> BA.2.10
We could potentially report both lineages? But that means you'd have to deal with multiple values in a single column in the output file. I could add this behaviour and make it configurable via a cli flag like --split-recombinants
(I'll open a feature request for this).
Recombinant
is a special case in the collapse file that collapses all recombinants to Recombinant. If you remove Recombinant
from the collapse file then recombinants like XBB will only be collapse up to their X lineage.
For now If you want to collapse XBB to B.1.1.529 you could do that with post processing e.g.
Add B.1.1.529 and XBB to your collapse file and run pango-collapse.
import pandas as pd
df = pd.read_csv("nextclade_collapsed.tsv", sep="\t")
df.Lineage_family.replace("XBB", "B.1.1.529", inplace=True) # combine XBB and B.1.1.529
Oh and if you don't supply an alias_key.json
file with the --alias-file
flag then the alias_key.json
file will be downloaded from https://raw.githubusercontent.com/cov-lineages/pango-designation/master/pango_designation/alias_key.json every time you run pango-collapse
i.e. it always uses the latest file.
Thank you very much, thats makes obviously sence! For my usecase its fine to leave recombinants at XBB for now :) And thank you for sending the source to the json file thats want i was asking for!
Hi @Wytamma!
I want to use your package for my project and I'm curious why it's not possible to trace recombinants (e.g. XBB.1.5) back to the omicron parental lineage (B.1.1.529). Although I removed the recombinants from the collapse file, I get the same input lineage just with the add of "Recombinant" as lineage familly.
I assume that the information about the parent of a sublineage comes from the alias_key.json in the repo but is it up to date according to the pangolin annotation?
Thank you very much.