Closed YushaLiu closed 1 year ago
Hi @YushaLiu --
Thanks for raising this issue! If you are still encountering this issue, I think it might be because the ILPSolver's parameters are not permissive enough and the program might not be able to find a suitable potential graph. I've noticed that depending on your setup, this error message might not be propagated to the error logs.
There are two things you can try to do:
lca_cutoff
instead of cell_cutoff
in the HybridSolver
constructor. Using lca_cutoff
might do a better job constraining the difficulty of the problem. I would suggest trying something like lca_cutoff=15
to start.maximum_potential_graph_layer_size
to something like 15000. Simultaneously, you should consider setting maximum_potential_graph_lca_distance=20
to constrain how deeply the program will look for ancestors. Please let me know if these suggestions are helpful, else I'd be happy to take a deeper look.
Best, Matt
Hi Matt,
Thanks very much for your suggestions! I tried lca_cutoff=24
and maximum_potential_graph_lca_distance=30
, and was able to get results within two days. I noticed that the hybrid solver is now solving a much larger number of subproblems (~380) than before (~80), when I set cell_cutoff=40
and didn't specify lca_cutoff
ormaximum_potential_graph_lca_distance
. Does this mean specifying lca_cutoff
can make the subproblems more manageable so each of them can complete in a shorter time, than specifying cell_cutoff
?
Also, are lca_cutoff=24
and maximum_potential_graph_lca_distance=30
realistic choices? Is there a way to estimate these parameters, depending on the complexity of lineage tracing data? Will larger values of these parameters lead to better lineage reconstruction results but also be significantly slower to run?
Hi @YushaLiu ,
Great to hear!
While lca_cutoff
and cell_cutoff
might seem to be related to one another (and indeed they can be correlated), I find the lca_cutoff
to be more effective at choosing reasonably-complex subproblems to pass onto ILP. This is because even small cell subsets can represent great allelic diversity that can cause the ILPSolver to run for a long time.
The maximum_potential_graph_lca_distance
parameter limits the depth at which to look for ancestors to add to the potential graph. While I have not done a thorough comparison, my anecdotal recommendation is that there are diminishing returns (if any at all) to look exceedingly deep into the evolutionary history to add ancestors to the tree. In the old Cassiopeia codebase, we hardcoded this parameter to be ~15; here we have generalized it such that a user can enumerate all ancestors if they wish. Either way, while increasing the maximum_potential_graph_lca_distance
parameter might create a larger potential graph, it does not guarantee that the ILPSolver will be able to find a perfect solution in a reasonable amount of time.
I think that these parameters you suggested (lca_cutoff=24
and maximum_potential_garph_lca_distance=30
) are quite reasonable given my experience. Raising the lca_cutoff
will just allow more complex subproblems to be passed to the ILPSolver, which can be good or bad depending on how long you want to wait for the ILPSolver to run.
Thanks very much! Very helpful to know.
Hi Matt, I'm running Cassiopeia Hybrid on single cell lineage data with about 9000 cells and 30 characters, but the job does not finish even after 5 days. The log files suggest that a few sub problems never finish and the corresponding log files stop updating after one day or two (see attached for one such log file). I'm using the following parameters to call Cassiopeia Hybrid:
Any thought why this happens? If the sub program is still running but needs more time, the log file should keep being updated, right? I can share the data and the entire scripts if that's helpful. Thanks very much! M1_1_v4-5.log