HelenaLC / CATALYST

Cytometry dATa anALYsis Tools
66 stars 30 forks source link

Named UMAPs #362

Closed purisuto closed 11 months ago

purisuto commented 12 months ago

Could I request the ability to name dimensionality reductions so that more than one UMAP can be stored in an SCE? This would be helpful when I want to visualise how a set of FlowSOM clusters distribute on UMAPs generated with different type markers, e.g. looking at B cell isotype clusters on a B cell lineage UMAP.

I guess also as a general point, after re-clustering an sce, one starts to run into issues with cluster_codes() not matching up correctly. Lately, I have started to put everything in the column data and edit the functions so they refer solely to the column data. Would it be possible to make a CATALYST workflow that does not use metadata or cluster_codes, but instead puts everything as column data? I guess the merging of the SOM needs to be kept as an independent object, but beyond that, colData is ok?

HelenaLC commented 11 months ago
  1. This is not related to CATALYST, but general SingleCellExperiment functionality. You can edit reducedDims names via names(reducedDims(sce)) <- ... (reducedDimNames(sce) <- might also work, I think), e.g., run one round of clustering & dimension reduction, rename, and run another round. Alternatively, runDR() accepts dot arguments (...). So you can set name = "..." directly (see ?runDR, which links to runPCA/TSNE/UMAP/...).

  2. Yeah, I understand this could be seen as an annoyance... however, at the time, there was a lot of functionality dependent on having hierarchical clusterings, which isn't guaranteed when they are generated independently (e.g., using alternative clustering algorithms), so that we decided against CATALYST doing this. But like you said, and easy workaround is to simply place all clusterings in the colData once it's been generated, i.e.,

    codes <- cluster_codes(sce)
    kids <- sapply(names(codes), \(k) cluster_ids(sce, k))
    # optionally, rename 'kids' columns to something
    # meaningful when there are multiple clusterings
    colData(sce) <- cbind(colData(sce), kids)
purisuto commented 11 months ago

Thank you. Can you recommend a good way to put a named clustering from the colData back into the cluster_codes so that I can run the CATALYST functions on this clustering? This will allow me a quick workaround where I can perform two rounds of clustering, save the first in the colData and then revert back to the first after inspecting the second.

This would be especially useful for plotExprHeatmap() since I was not able to convert this to use the colData. On the other hand, it appears that plotDR already defaults to the colData (e.g. sce$meta20), rather than the cluster codes' meta20, because I noticed that it reverts back to the default R color palette. Diffcyt() too I think is ok using colData.

HelenaLC commented 11 months ago

There's code to place arbitrary clusterings as cluster_codes in the vignette. You could write a little utility function wrapping around this to switch between clusterings in the fly.