HelenaLC / CATALYST

Cytometry dATa anALYsis Tools
67 stars 30 forks source link

Subclustering #307

Closed france-hub closed 1 year ago

france-hub commented 1 year ago

Hello Helena,

Thank you again for this package. It's helping me a lot interpreting my spectral flow data. The following it's not an issue with t he package but I'd like to know how you would proceed.

I have human samples and two conditions labeled as: Res and NonRes. Now, the distribution of my subsets on the 2D UMAP is the following: umap_clus

If you look at metacluster 6 it seems that there are two subclusters for the two conditions. I used the delta plot to pick the number of clusters (meta8); if I go up to meta20 I am able to split the cluster in two and look at the markers expressed in each of them. However, is this the proper way to proceed?

Thanks! Francesco

david-priest commented 1 year ago

Yes if you think that a cluster at a lower metaclustering level (e.g. cluster 6 at meta8), is really two separate clusters, which should be kept separate, then you can steadily increase the meta level until those clusters are separated. Here, you may find that cluster 6 becomes separated at meta10 or meta15. You can then use a merging table to name the clusters, which gives you the opportunity to manually merge clusters that you deem to be improperly split.

HelenaLC commented 1 year ago

There are two options, I'd say:

  1. What you already did. I.e., going to a higher-resolution clustering. By default, all of 2-20 metaclusters are available within the object. You can get to even more by increasing the maxK parameter in cluster().
  2. Perform a subclustering on a subset of the data. E.g., you could subset specific subpopulations via sub <- filterSCE(sce, k = "meta8", cluster_in %in% c(6, ...)), and re-cluster the subset via cluster(sub, ...).

As to which option is preferred / more sensible: it depends. If the differences in other populations are much larger, increasing the number of metaclusters might never split some other populations, and reclustering on a subset would be required.

However, in your case, it seams simply going to a higher resolution does the trick, and you can work with, say, the meta20 clustering and use mergeClusters() to combine what you feel are biologically similar subpopulations, until you reach a number of distinguishable and interpretable clusters (as @purisuto suggested).