Closed wutron closed 12 months ago
Hi Jessica/@wutron!
Thanks for bringing this up (and for all of your contributions as of late)! The --depthcutoff
option is still in somewhat of a beta testing phase, but you're right- this will not work properly for recombinant lineage descendants. Instead of checking for "X" lineages I believe the correct thing to is to check if the recombinant_parents
property exists for that lineage. Should be a relatively quick fix @dylanpilz?
Regarding your question about the collapsing to the root: In the scenario you specified in which b
and d
are no longer distinguishable but their siblings a
and c
are, you'd have b
and d
grouped into a lineage that would be assigned to the root, but a
and c
would not get merged into this same group. They would remain separate (if they are not distinguishable from one another they should get merged in a different group- also with the root name, but with an additional number to show that it's not the same as the b
/d
group), and these lineage groupings are noted in the yml file that also gets returned.
The use of one as the "canonical" lineage name is an interesting idea, but I worry that it can be a bit misleading- what we might call EG.4 could easily be grouped into with a ton of other XBB lineages in a low coverage sample, even if we don't have any evidence to separate it from other XBB lineages. Or is there a non-random way that your "canonical" lineage is chosen?
Josh
Point 1: Sounds good on the quick fix.
Point 2: I see where it introduces a suffix number to the mrca name now. That resolves my concern then!
P.S. The "canonical" name is just a randomly chosen name, and I rely on the mapping to determine whether a lineage name maps to multiple lineages. It sounds like we have the same process, e.g. find what to collapse and show what collapses in a separate file. But I can see how renaming the lineages has less potential for confusing users.
Thanks!
Hey Jessica,
I'm finishing up the fix and was wondering if you could share one or more offending samples along with the command you ran that led to the error?
Dylan
Can I share via email instead?
For sure! Feel free to send it to jolevy@scripps.edu, and I'll share it with Dylan.
Hi Josh and Dylan,
Thanks for all the updates to Freyja! I have been exploring some of the new features in your GitHub repo. However, when I use
--depthcutoff
from #145, I get the following error:I dug into
utils.py
, and it seems like the code assumes that lineages that start with "X" have recombinant parents: https://github.com/andersen-lab/Freyja/blob/dfdd47f3656768c70c774edd219193164c096e42/freyja/utils.py#L878-L881However, not all lineages that start with "X" have recombinant parents unless you go up the hierarchy. For example, XBJ.1 does not have recombinant parents in the yml unless you first trace up to its parent XBJ that does have recombinant parents: https://github.com/andersen-lab/Freyja/blob/dfdd47f3656768c70c774edd219193164c096e42/freyja/data/lineages.yml#L30893-L30908
I admit I have not thought much about this grouping by MRCA, so I'm not sure if the "right" solution is to trace up to parent aliases.
I also want to bring up one issue when using the MRCA. Consider the following tree, where
a
,b
,c
, andd
are lineages:Correct me if I am wrong, but if there are convergent mutations, I think it is possible for
a
andc
to have barcodes that group together andb
andd
to have barcodes that group together (once you remove sites based on--depthcutoff
). Grouping by MRCA would cause everything to group into a MRCA =root
even thougha
andc
are distinguishable fromb
andd
.As a suggested alternative, in my workflow, if I have lineages that are indistinguishable by barcode, instead of tracing up to an MRCA, I designate one as the "canonical" lineage name, and I have a separate file that shows all lineages that map to each canonical lineage.
Thanks, Jessica