Closed patrickjdanaher closed 1 year ago
Basic solution: only sub-cluster the cell types in question; don't revisit the others. For omitted cells, just copy the logliks from the original cluster to its subclusters. E.g., a B-cell will get the same logliks for myeloid_1 and myeloid_2 as it had for "myeloid".
Big question: do we use the above logic for supervised only, or for unsupervised results as well?
The dilemma:
Solutions:
(2) seems easier, both to implement and explain.
I think (2) makes the most sense, in fact I wouldn't have guessed sub-clustering grabbed cells from other types than the one being split.
But I don't fully get the explanation for why the sub-clustering grabs all of these other cells. Aren't the logliks updated based on the profiles generated from the actual count data in the end?
More complex version: (possibly):
Implemented (2); will merge to ADO soon.
Here's what happens: