Nanostring-Biostats / InSituType

An R package for performing cell typing in SMI and other single cell data
Other
22 stars 10 forks source link

refineClusters subclustering logic fails when applied to supervised results #161

Closed patrickjdanaher closed 1 year ago

patrickjdanaher commented 2 years ago

Here's what happens:

patrickjdanaher commented 2 years ago

Basic solution: only sub-cluster the cell types in question; don't revisit the others. For omitted cells, just copy the logliks from the original cluster to its subclusters. E.g., a B-cell will get the same logliks for myeloid_1 and myeloid_2 as it had for "myeloid".

Big question: do we use the above logic for supervised only, or for unsupervised results as well?

patrickjdanaher commented 1 year ago

The dilemma:

Solutions:

  1. Somehow track whether a result is supervised/supervised or unsupervised, then have subclustering act appropriately.
  2. Just make the subclustering functionality confine itself to the selected cell type, and never reassign cells from other cluster to the new subclusters.

(2) seems easier, both to implement and explain.

davidpross commented 1 year ago

I think (2) makes the most sense, in fact I wouldn't have guessed sub-clustering grabbed cells from other types than the one being split.

But I don't fully get the explanation for why the sub-clustering grabs all of these other cells. Aren't the logliks updated based on the profiles generated from the actual count data in the end?

patrickjdanaher commented 1 year ago

More complex version: (possibly):

patrickjdanaher commented 1 year ago

Implemented (2); will merge to ADO soon.