ImmuneDynamics / Spectre

A computational toolkit in R for the integration, exploration, and analysis of high-dimensional single-cell cytometry and imaging data.
https://immunedynamics.github.io/spectre/
MIT License
57 stars 22 forks source link

Advice on how to sub-cluster? #164

Open denvercal1234GitHub opened 1 year ago

denvercal1234GitHub commented 1 year ago

Hi there,

Thanks for the tool!

From #161, run.flowsom seemed to be unable to break up the big orange cluster but the UMAP seemed to suggest there are potentially more clusters within it. Increasing meta.k did not do it either.

I was wondering if you could give some advice on how to sub-cluster? Do I simply subset the cell.dat then run run.flowsom again directly?

Thanks again!

ghar1821 commented 1 year ago

Feel free to try that approach and assess if it leads to any improvement. However, I suggest examining the FlowSOM_cluster column first, which provides the SOM node allocations for the cells. If you find that the cells in the orange cluster are only assigned to a small number of SOM nodes, it may be worth considering increasing the grid size to capture more variability. Alternatively, you can explore using a different algorithm such as run.phenograph to see if it is more effective.

denvercal1234GitHub commented 1 year ago

Thank you @ghar1821 for your input!

  1. Would you mind giving me more guidance on how to increase the grid size?

  2. To run.phenograph, do I simply run it in place of run.flowsom on my cell.dat object and nothing else changes? Is the k parameter the max number of clusters?

######### CASE 0
cell.dat <- Spectre::run.phenograph(cell.dat, use.cols =  strict_NoManual, clust.name = "Phenograph_strict_NoManual", k=45)
  1. Also, if I run.flowsom multiple times, each with different meta.k and markers used for clustering, I would only need to run.umap as below, and then, it would allow me to use visualise the result of different run.flowsom on the right cell.dat.sub object's UMAP as below?
#### CASE 1
cell.dat <- Spectre::run.flowsom(cell.dat, strict_NoManual, meta.k = 'auto', meta.clust.name= "FlowSOM_metacluster_backbonestrict_NoManualAuto", clust.name = "FlowSOM_cluster_backbonestrict_NoManualAuto")

#### CASE 2
cell.dat <- Spectre::run.flowsom(cell.dat, strict_NoManual, meta.k = 'auto', max.meta = 40, meta.clust.name= "FlowSOM_metacluster_backbonestrict_NoManualAutoMax40", clust.name = "FlowSOM_cluster_backbonestrict_NoManualAutoMax40")

#### CASE 3
cell.dat <- Spectre::run.flowsom(cell.dat, relax_NoManual, meta.k = 'auto', meta.clust.name= "FlowSOM_metacluster_backbonerelax_NoManualAuto", clust.name = "FlowSOM_cluster_backbonerelax_NoManualAuto")
#### CASES 0, 1, AND 2
cell.dat.sub_strict_NoManual <- run.umap(cell.dat.sub, use.cols=strict_NoManual)

##### CASES 3
cell.dat.sub_relax_NoManual <- run.umap(cell.dat.sub, use.cols=relax_NoManual)
###### VISUALIZE CASE 0
make.colour.plot(cell.dat.sub_backbone, "UMAP_X", "UMAP_Y", col.axis = "Phenograph_strict_NoManual", col.type = 'factor') 

###### VISUALIZE CASE 1 
make.colour.plot(cell.dat.sub_backbone, "UMAP_X", "UMAP_Y", col.axis = "FlowSOM_metacluster_backbonestrict_NoManualAuto", col.type = 'factor') 

###### VISUALIZE CASE 2
make.colour.plot(cell.dat.sub_backbone, "UMAP_X", "UMAP_Y", col.axis = "FlowSOM_metacluster_backbonestrict_NoManualAutoMax40", col.type = 'factor') 

###### VISUALIZE CASE 3
make.colour.plot(cell.dat.sub_backbone, "UMAP_X", "UMAP_Y", col.axis = "FlowSOM_metacluster_backbonerelax_NoManualAuto", col.type = 'factor') 
ghar1821 commented 1 year ago

You can increase the grid size by increasing the xdim and ydim parameter. Do note though that the size of the grid used by FlowSOM is xdim * ydim. Hence, adding even just 1 to xdim and ydim will substantially increase the grid size.

The k parameter in phenograph does not represent the number of clusters. The k parameter affects the resolution of the clusters. The smaller the number, the smaller the size of the clusters you will get (clusters will have less cells). Hence, if you want to get more clusters, reduce the k, otherwise increase it. Every dataset is different, and there is no one k value that will rule it all. I suggest you experiment with various values, and maybe start at 5 (5 tend to work well for me in the past).

I'm not quite sure what you mean by "visualise the result of different run.flowsom on the right cell.dat.sub object's UMAP as below?". The code you wrote doesn't make sense to me as I don't know what cell.dat.sub_backbone is, and you seem to not use either cell.dat.sub_strict_NoManual or cell.dat.sub_relax_NoManual in make.colour.plot.

denvercal1234GitHub commented 1 year ago

Thank you @ghar1821 for your response!

For question 3, basically I have 3 sets of markers (backbone markers, strict_markers, and relax_markers) that I want to use to do the clustering on my cell.dat object. So, I first run run.flowsom on my cell.dat object for each of these marker sets with the corresponding names for meta.clust.name and clust.name for each set.

My question 3 was then how should I do run.umap to visualise the clusters of these 3 runs? Do I need to do run.umap 3 times with each time having the respective use.cols?

ghar1821 commented 1 year ago

oh i see what you mean now. I suppose that is one way of doing it, repeat run.flowsom and run.umap 3 times, each with different set of markers. By doing this though, bear in mind that the umap plot will look different as the coordinates are calculated based on different sets of markers.

I guess my next question will be, what are you trying to find from these umaps? Are you trying to compare how the clusters differ if given different sets of markers? If that is the case, it may make more sense to run umap once, and visualise the clusters 3 times (1 colour plot per set of markers). If doing this, then you will have to decide, what markers should I feed into the umap that shall allow me to best visualise all 3 results. Perhaps a combination of all 3 sets of markers? Or maybe just the markers common to all 3 sets.

tomashhurst commented 1 year ago

Hi @denvercal1234GitHub , we also have a workflow for 'multi-level' clustering (see Figure 4 here: https://onlinelibrary.wiley.com/doi/10.1002/cyto.a.24350). Essentially we do a first round of clustering to identify major groups of cells (e.g. CD4, CD8, B cells etc) and then on each lineage, we do another level to gain more detail (e.g. Naive, Central Memory, Effector Memory, etc). We don't have a script for this online, but I can send you what I use if it would be helpful?

denvercal1234GitHub commented 1 year ago

Hi @tomashhurst -- That would be very useful if I could have the script for the multi-level clustering when you get a chance?

Also @ghar1821 and Thomas, in CATALYST (https://bioconductor.org/packages/release/bioc/vignettes/CATALYST/inst/doc/differential.html), they have this delta_area() function that can help us determine the optimal number of clusters and a plotNRS() to help select the markers for clustering. Do we have anything similar or would you mind letting me know how we might be able to still use these two functions in Spectre workflow?

Thanks again very much!

denvercal1234GitHub commented 7 months ago

Hi @tomashhurst - I hope all is well, and thanks for your help earlier. I wonder if you would not mind emailing me the script you mentioned for the multi-level clustering? quang.n.nguyen@alumni.duke.edu Thanks so much again!