SupercellularSampler implementation, along with several utility functions in CassiopeiaTree that are useful in dealing with ambiguous states and branch distances.
Some important changes are the following.
Ambiguous characters are denoted as a tuple in the character matrix and/or the network. Updates to some functions and addition of new functions in CassiopeaTree to support ambiguous characters.
When computing dissimilarities with CassiopeiaTree.compute_dissimilarity_map, a new cluster_dissimilarity function should be used when there are ambiguous characters. Here is an example snippet
from functools import partial
dissimilarity_function = partial(cluster_dissimilarity, weighted_hamming_distance)
tree.compute_dissimilarity_map(dissimilarity_function)
I had considered whether this should be done automatically in compute_dissimilarity_map, but felt that the arguments should always be explicit for the user.
Updated n_cell, n_character and populate_tree in CassiopeiaTree to use the current character matrix instead of the original one because these properties/function should always return/operate on the current object state, instead of a possibly-outdated state (i.e. the original character matrix). Previously, the original (old) state was being used and this was causing problems for GreedySolver when trying to call collapse_mutationless_edges.
This PR is big enough for reviews from both @mattjones315 and @richardyz98.
SupercellularSampler
implementation, along with several utility functions inCassiopeiaTree
that are useful in dealing with ambiguous states and branch distances.Some important changes are the following.
CassiopeaTree
to support ambiguous characters.CassiopeiaTree.compute_dissimilarity_map
, a newcluster_dissimilarity
function should be used when there are ambiguous characters. Here is an example snippetI had considered whether this should be done automatically in
compute_dissimilarity_map
, but felt that the arguments should always be explicit for the user.n_cell
,n_character
andpopulate_tree
inCassiopeiaTree
to use the current character matrix instead of the original one because these properties/function should always return/operate on the current object state, instead of a possibly-outdated state (i.e. the original character matrix). Previously, the original (old) state was being used and this was causing problems forGreedySolver
when trying to callcollapse_mutationless_edges
.This PR is big enough for reviews from both @mattjones315 and @richardyz98.