HelenaLC / CATALYST

Cytometry dATa anALYsis Tools
66 stars 31 forks source link

Clarification about re-clustering/sub-clustering #369

Closed denvercal1234GitHub closed 1 year ago

denvercal1234GitHub commented 1 year ago

Hi there,

Thank you again for the package.

Q1. If I filter the sce object, then simply run CATALYST::cluster on the subsetted sce, it will automatically overrides the existing cluster_id from my original clustering?

Q2. I still see UMAP coordinates from the original clustering of the un-subsetted object in my subsetted sce. So, does that mean everything I did to the original sce will be preserved until I ran new code, e.g., runDR(), to the subsetted object?

Q3. Why I only see the legend for some of the cluster code groups and not 100 for my CATALYST::plotNRS() plot, while I still see 100 SOM codes in my subsetted object? I ultimately want to make sure the existing labels from original clustering don't get mixed up with the new clustering...

> table(F37_sub3188EventsPerCluster_Clustering$cluster_id)

   1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18   19 
 278  245  404  337  561  882 1033  515  673  531  542  387  441  298  462  339  498  566  292 
  20   21   22   23   24   25   26   27   28   29   30   31   32   33   34   35   36   37   38 
 353  308  700  460  586  515  311  502  388  683  530  429  698  387  616  388  410  683  376 
  39   40   41   42   43   44   45   46   47   48   49   50   51   52   53   54   55   56   57 
 443  263  779  597  731  363  358  353  493  410  373  495 1065  372  932  706  582  484  535 
  58   59   60   61   62   63   64   65   66   67   68   69   70   71   72   73   74   75   76 
 366  320  643  393  705  337  388  469  801  317  546  461  248  629  790  294  391  342  673 
  77   78   79   80   81   82   83   84   85   86   87   88   89   90   91   92   93   94   95 
 659  466  491  380  563  608  303  235  696  725  665  450  755  265  219  255  362  289  389 
  96   97   98   99  100 
 728  761  787  730  490 
Screenshot 2023-09-15 at 08 42 19

Thank you for your help!

HelenaLC commented 1 year ago
  1. Yes, that's true. However, you could simply keep a backup of the original cluster_id, e.g., ..., and switch back to these if they are of interest still for any reason. Though, in that case, it'd make more sense to use the original (complete) dataset for which they were computed.

    sub <- filterSCE(sce, ...)
    sub$cluster_id0 <- sub$cluster_id
    sub <- cluster(sub, ...)
  2. Yes, of course. Dimension reductions and other computations rely on the data they are run on. So if you subset, that needs to be recomputed (nothing happens automatically when running filterSCE(sce, ...)).

  3. Likely because the subsetted dataset does not contain cells from all 100 clusters, but only the ones displayed. Isn't that to be expected when you subsetted the data?

denvercal1234GitHub commented 1 year ago

Thank you so much @HelenaLC for your response. For Q3, I did reran CATALYST::cluster, with default arguments, on my subsetted sce, so I expect to have 100 FlowSOM codes, right? Or that is not necessarily true?

HelenaLC commented 1 year ago

Aha, yes, if you reran cluster() with defaults, there should indeed be 100 clusters. And as far as the output of table() goes above, that is the case! So I think it is more of an issue with how you are calling plotNRS - NRS are computed independently of cluster identifiers. So coloring samples by cluster makes no sense, and I guess it is just coloring samples according to the cluster label of the first cells in a given sample, which is pretty random.