DillonHammill / CytoExploreR

Interactive Cytometry Data Analysis
61 stars 13 forks source link

FEATURE DEMO: Dimensionality Reduction Using cyto_map() #33

Closed DillonHammill closed 3 years ago

DillonHammill commented 4 years ago

CytoExploreR has full support for PCA, tSNE, FIt-SNE, UMAP and EmbedSOM dimensionality reduction algorithms through the cyto_map() function. The demo below demonstrates how to combine dimensionality reduction with manual gating to classify populations.

The first steps are exactly the same as conventional cytometry data analysis using manual gating (i.e. load samples, apply compensation, transform channels and manually gate populations). For simplicity I will use the Activation dataset and apply the gatingTemplate shipped with CytoExploreRData. This is a relatively simple dataset that will help to illustrate what dimensionality reduction aims to achieve.

Data Preparation

# Load required packages
library(CytoExploreRData)
library(CytoExploreR)

# Prepare Activation dataset
gs <- GatingSet(Activation)
gs <- cyto_compensate(gs)
gs <- cyto_transform(gs)

# Manual gating
gs <- cyto_gatingTemplate_apply(gs, Activation_gatingTemplate)

# Gating scheme
cyto_plot_gating_scheme(gs[[32]])

Gating-Scheme

The benefit of performing some gating prior to dimensionality reduction, is that CytoExploreR will use these gated populations to identify and label populations in a lower dimensional space. This will save you a lot time when you are trying to identify clusters.

cyto_map() will automatically construct a consensus map and return the split samples for you. By default cyto_map will map all events, but the number of pooled events to map can be reduced using the display argument. If you need to downsample on a per sample basis, you can use cyto_sample() first and then pass your samples to cyto_map(). cyto_map() will map the data using all channels with markers assigned by default, but this can be changed by supplying specific channels to the channels argument. Below is an example of how we would apply the FIt-SNE dimensionality reduction algorithm to our samples.

IMPORTANT: Additional configuration steps are required to use FIt-SNE with CytoExploreR, unless you are working with the CytoExploreR Docker image. If you don't have FIt-SNE set up, you can set type = "UMAP" below.

NOTE: cyto_map() returns a new GatingSet containing the mapped events, it is therefore recommended that this be assigned under a new name (e.g. gs_map).

Dimensionality Reduction

gs_map <- cyto_map(gs,
                   parent = "T Cells", # population to map
                   channels = c("CD4", "CD8", "CD44", "CD69"),
                   type = "FIt-SNE",
                   display = 100000) # map 100000 events only

FIt-SNE

cyto_map() will automatically plot the consensus map and overlay any gated populations to help in the identification of cell populations. This plot can then be saved and used as a key to identify populations when plotting individual samples.

# Plot grouped samples
cyto_plot(gs_map[-33],
             parent = "T Cells",
             overlay = "descendants",
             channels = c("FIt-SNE-1", "FIt-SNE-2"),
             group_by = c("Treatment", "OVAConc"),
             point_col = "grey") # turn off density colour gradient in base layer

maps

As you can see, dimensionality reduction is powerful tool to look at the dataset as a whole. We can clearly see both the CD4 T Cells and CD8 T Cells populations. As we would expect, the number of activated CD4 T Cells and CD8 T Cells increases as the OVA antigen concentration increases.

CytoExploreR has a variety of tools to help visualize and annotate data mapped using cyto_map(), these features will be explored in depth in a Dimensionality Reduction vignette (coming soon).

Biomiha commented 4 years ago

Hi @DillonHammill,

If possible would you be able to add the {uwot} implementation of UMAP? In my experience the performance gains compared to the pure R version are significant and it also seems to be (probably for the same reason) the more popular choice in the literature.

Thanks.

DillonHammill commented 4 years ago

@Biomiha, I think what I will do is allow a function to be passed through the type argument so that you can use any function you like. I want to avoid adding excess dependencies to CytoExploreR.

DillonHammill commented 4 years ago

@Biomiha, you can now pass the name of any function to the type argument of cyto_map() to perform custom dimensionality reduction and the standard options are still supported. Here are some examples:

Use a custom mapping function (developers): Developers can create their own function that can be passed to cyto_map() directly. The only requirement is that the function must accept a matrix as its first argument and a matrix should be returned containing the mapped co-ordinates (may be in an object slot). Developers will also need to ensure that their function does not contain any arguments used by cyto_map() already (e.g. select, display, split etc.). For demonstration, we could map our data using this completely useless mapping function, that accepts a matrix and simply assigns index values to each event:

custom_map <- function(x){
  map_coords <- matrix(rep(seq_len(nrow(x)), 2), ncol = 2)
  return(map_coords)
}

To use the function simply supply the name of the function to the type argument of cyto_map():

cyto_map(gs,
         parent = "Live Cells",
         channels = c("Va2", "CD4", "CD8", "CD44", "CD69", "CD11c"),
         type = custom_map,
         display = 5000)

custom_map

Notice how the name of the function will used to name the new mapping parameters (e.g. custom_map-1 and custom_map-2). So make sure you give your function a nice name!

Use a mapping function from another package: You can also use mapping functions that may be defined in other packages and that are not natively supported in CytoExploreR. You will need to make sure that the required package is installed and then pass the name of the function you want to use to the type argument. For example to use the umap function from uwot:

cyto_map(gs,
         parent = "Live Cells",
         channels = c("Va2", "CD4", "CD8", "CD44", "CD69", "CD11c"),
         type = uwot::umap,
         display = 5000)

It is probably a good idea to prefix the function name with the name of the package you will like to use to prevent conflicts with other loaded packages (e.g. uwot::). In this case the umap function is exported by both the umap and uwot packages.

uwot

There you go! Now you can use any dimensionality reduction algorithm that you like, not just the ones natively supported in CytoExploreR! Currently, only PCA, tSNE, FIt-SNE, EmbedSOM and UMAP are natively supported in CytoExploreR, and these functions can be used by quoting the name (e.g. type = "UMAP").

baj12 commented 4 years ago

Hi, this looks really interesting. I am currently struggling with using FlowSOM and a gatingHierachy as input. It seems you only have the GatingSet implement (based on the help page). Would you happen to have an example with FlowSOM that you can share? Thx Bernd

baj12 commented 4 years ago

Hi Dillon, looks really great what you intend to do and I am excited to test. I am running into the following problem when using the code from this page:

# Load required packages
library(CytoExploreRData)
library(CytoExploreR)
#> Loading required package: flowCore
#> Loading required package: flowWorkspace
#> As part of improvements to flowWorkspace, some behavior of
#> GatingSet objects has changed. For details, please read the section
#> titled "The cytoframe and cytoset classes" in the package vignette:
#> 
#>   vignette("flowWorkspace-Introduction", "flowWorkspace")
#> Loading required package: openCyto

# Prepare Activation dataset
gs <- GatingSet(Activation)
gs <- cyto_compensate(gs)
gs <- cyto_transform(gs)


# Manual gating
gs <- cyto_gatingTemplate_apply(gs, Activation_gatingTemplate)
#> Preprocessing for 'Cells'
#> Gating for 'Cells'
#> done!
#> done.
#> Preprocessing for 'Single Cells'
#> Gating for 'Single Cells'
#> done!
#> done.
#> Preprocessing for 'Dead Cells'
#> Gating for 'Dead Cells'
#> done!
#> done.
#> Live Cells gating...
#> done!
#> done.
#> Preprocessing for 'T Cells'
#> Gating for 'T Cells'
#> done!
#> done.
#> Preprocessing for 'CD8 T Cells'
#> Gating for 'CD8 T Cells'
#> done!
#> done.
#> Preprocessing for 'CD69+ CD8 T Cells'
#> Gating for 'CD69+ CD8 T Cells'
#> done!
#> done.
#> Preprocessing for 'CD4 T Cells'
#> Gating for 'CD4 T Cells'
#> done!
#> done.
#> Preprocessing for 'CD69+ CD4 T Cells'
#> Gating for 'CD69+ CD4 T Cells'
#> done!
#> done.
#> Preprocessing for 'Dendritic Cells'
#> Gating for 'Dendritic Cells'
#> done!
#> done.
#> finished.

# Gating scheme
# cyto_plot_gating_scheme(gs[[32]])

gs_map <- cyto_map(gs,
                   parent = "T Cells", # population to map
                   channels = c("CD4", "CD8", "CD44", "CD69"),
                   type = "PCA",
                   display = 100000) # map 100000 events only
#> Computing PCA co-ordinates...
#> Error in FUN(X[[i]], ...): object 'cf' not found

Created on 2020-09-05 by the reprex package (v0.3.0.9001)

Any idea what is happening?

Originally, I got the following error, but I couldn't get to this with reprex:

> gs_map <- cyto_map(gs,
+                    parent = "T Cells", # population to map
+                    channels = c("CD4", "CD8", "CD44", "CD69"),
+                    type = "UMAP",
+                    display = 100000) # map 100000 events only
Computing UMAP co-ordinates...
Error in if (max(abs(i)) > nrow(x)) stop(msg, call. = FALSE) : 
  missing value where TRUE/FALSE needed
DillonHammill commented 4 years ago

@baj12, I think this related to cf_append_cols() issues in flowWorkspace. I will have a look at this today and report back.

LCapitani commented 3 years ago

@baj12, I think this related to cf_append_cols() issues in flowWorkspace. I will have a look at this today and report back.

@DillonHammill I get the exact same error - any updates on the matter? Thanks for this great package!

DillonHammill commented 3 years ago

Sorry for not addressing this sooner, I have been pre-occupied with a lot of things the last couple of weeks. This should now be fixed. Please install the latest version from GitHub and let me know how you go:

devtools::install_github("DillonHammill/CytoExploreR")
viktorzou commented 2 years ago

CytoExploreR has full support for PCA, tSNE, FIt-SNE, UMAP and EmbedSOM dimensionality reduction algorithms through the cyto_map() function. The demo below demonstrates how to combine dimensionality reduction with manual gating to classify populations.

The first steps are exactly the same as conventional cytometry data analysis using manual gating (i.e. load samples, apply compensation, transform channels and manually gate populations). For simplicity I will use the Activation dataset and apply the gatingTemplate shipped with CytoExploreRData. This is a relatively simple dataset that will help to illustrate what dimensionality reduction aims to achieve.

Data Preparation

# Load required packages
library(CytoExploreRData)
library(CytoExploreR)

# Prepare Activation dataset
gs <- GatingSet(Activation)
gs <- cyto_compensate(gs)
gs <- cyto_transform(gs)

# Manual gating
gs <- cyto_gatingTemplate_apply(gs, Activation_gatingTemplate)

# Gating scheme
cyto_plot_gating_scheme(gs[[32]])

Gating-Scheme

The benefit of performing some gating prior to dimensionality reduction, is that CytoExploreR will use these gated populations to identify and label populations in a lower dimensional space. This will save you a lot time when you are trying to identify clusters.

cyto_map() will automatically construct a consensus map and return the split samples for you. By default cyto_map will map all events, but the number of pooled events to map can be reduced using the display argument. If you need to downsample on a per sample basis, you can use cyto_sample() first and then pass your samples to cyto_map(). cyto_map() will map the data using all channels with markers assigned by default, but this can be changed by supplying specific channels to the channels argument. Below is an example of how we would apply the FIt-SNE dimensionality reduction algorithm to our samples.

IMPORTANT: Additional configuration steps are required to use FIt-SNE with CytoExploreR, unless you are working with the CytoExploreR Docker image. If you don't have FIt-SNE set up, you can set type = "UMAP" below.

NOTE: cyto_map() returns a new GatingSet containing the mapped events, it is therefore recommended that this be assigned under a new name (e.g. gs_map).

Dimensionality Reduction

gs_map <- cyto_map(gs,
                   parent = "T Cells", # population to map
                   channels = c("CD4", "CD8", "CD44", "CD69"),
                   type = "FIt-SNE",
                   display = 100000) # map 100000 events only

FIt-SNE

cyto_map() will automatically plot the consensus map and overlay any gated populations to help in the identification of cell populations. This plot can then be saved and used as a key to identify populations when plotting individual samples.

# Plot grouped samples
cyto_plot(gs_map[-33],
             parent = "T Cells",
             overlay = "descendants",
             channels = c("FIt-SNE-1", "FIt-SNE-2"),
             group_by = c("Treatment", "OVAConc"),
             point_col = "grey") # turn off density colour gradient in base layer

maps

As you can see, dimensionality reduction is powerful tool to look at the dataset as a whole. We can clearly see both the CD4 T Cells and CD8 T Cells populations. As we would expect, the number of activated CD4 T Cells and CD8 T Cells increases as the OVA antigen concentration increases.

CytoExploreR has a variety of tools to help visualize and annotate data mapped using cyto_map(), these features will be explored in depth in a Dimensionality Reduction vignette (coming soon).

@DillonHammill Any updates on the Dimensionality Reduction vignette? Trying to color-code markers on the tSNE-plot built with cyto_map() Best regards Viktor