Open sprillo opened 1 year ago
Patch coverage: 63.07%
and project coverage change: -0.29%
:warning:
Comparison is base (
f895301
) 79.05% compared to head (cd53086
) 78.76%.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
More importantly, I am also surprised by the strategy you employed to implement this functionality. I always thought that we would implement this functionality by operating on the character matrix of a tree and producing a corrected
dissimilarity_map
, not unlike theCassiopeiaSolver.compute_dissimilarity_map
functionality. Do you think it's potentially overkill that we have embedded the usage of this functionality to only be pertitent to theCRISPRCas9DistanceCorrectionSolver
. In my eyes, it would be ideal to be able to get transformed distances for all sorts of tasks, solving lineages being one of them. Does that make sense?
@mattjones315 I think CassiopeiaSolver.compute_dissimilarity_map
does not exist, do you mean cassiopeia.data.utilities.compute_dissimilarity_map
? If you give me an example of the API and code you envision users to be writing, I can think about it. By the way, the more helpful distances would be those obtained from performing branch length estimation in the next step, since it essentially refines the corrected distances even further by using higher-order information rather than just pairs of leaves, and also uses MLE instead of moment-matching.
Hi @sprillo -- apologies for the delay getting back to this.
Also apologies for the miscommunication, I had meant CassiopeiaTree.compute_dissimilarity_map
method, but this is functionally equivalent to the function you pointed out (at least for the purposes of this conversation).
As for the interface that I had imagined, I could see us using compositional programming within the compute_dissimilarity_map
method. The pseudocode would look like this:
def compute_dissimilarity_map(character_matrix, tree, correction="none", ...):
if correction == "none":
dissimilarity_map = cas.data.utilities.compute_dissimilarity_map(character_matrix, tree, ...)
elif correction == "cas9":
dissimilarity_map = cas.data.distance_correction.crispr_cas9_distance_correction(character_matrix, tree, ...)
else:
raise Exception("correction method not found.")
return dissimilarity_map
In fact, currently the compute_dissimilarity_map
implemented in CassiopieaTree
contains a function argument that's used for computing dissimilarities between character matrix entries; one could imagine just injecting the crispr_cas9_distance_function into the function that way.
So, while I'm not convinced either of the two approaches above are categorically better / simpler than your proposed implementation, I wonder if you might be able to convince me that my two solutions are non-optimal.
Thanks in advance for the discussion!
Implement distance correction scheme for the CRISPR-Cas9 model.
The class implementing this method is
CRISPRCas9DistanceCorrectionSolver
in thecassiopeia/solver/distance_correction/_crispr_cas9_distance_correction_solver.py
module. (Thedistance_correction
subpackage is meant to contain any distance correction methods that might be implemented in the future, possibly for models other than CRISPR-Cas9.)The solver composes together four steps: (1) mutation proportion estimation, (2) collision probability estimation, (3) distance correction with the estimated mutation proportion and collision probability, and (4) tree topology reconstruction using the corrected distances. The solver is parameterized by these four steps. Note: Due to Numba compilation issues in the DistanceSolver, the function performing the third step is not injected into the solver, but rather determined by using a string identifier specifying the function name.
In the code, I declared some types to improve readability at select places. Underscores are used to denote functions or classes that are internal and are not exposed to the users, mostly to improve legibility.
Tests should be quite comprehensive. However, some tests are marked as slow since they require simulation (they take ~30s on my machine). To run the slow tests, use the
--runslow
flag, e.g.:python -m pytest test/solver_tests/distance_correction_tests --runslow
. (In particular, CodeCov complains but coverage with the slow tests is good.)