cole-trapnell-lab / monocle-release

279 stars 116 forks source link

How to use Seurat Generated UMAP for Monocle3 pseudotime analyse? #388

Closed sinvaya closed 4 years ago

sinvaya commented 4 years ago

Hi,

We want to use monocle3 for pseudotime analyze. But it generate a totally different UMAP than Seurat and it split into too many clusters. Monocle3 generates pseudotime based on UMAP. Can we use the UMAP of Seurat?

Thanks!

RobertAlpin commented 4 years ago

This issue has been brought up on the Seurat issues page as well, where a user there has posted a script that will work (although I personally had to make some modifications to get it to work for my purposes, namely changing the cluster source and carrying over the RNA assay matrix for differential analysis rather than the integrated one, as the UMAP is preserved either way).

JoshuaCumming commented 4 years ago

Hi Robert,

I am also having some issues with the script

Would you be able to share the modifications to it you made?

Thanks for any help!

RobertAlpin commented 4 years ago

So it would seem that there's a major issue with porting Seurat objects into Monocle, namely that the integration anchor data takes the place of PCA loadings, which causes issues downstream with pseudotime analysis as graphtest only has access to PCA loadings. Therefore I would suggest that you NOT use integration with monocle at this time. You might choose to try the Align function in Monocle which hypothetically serves a similar purpose, or to just cluster in seurat which, in my case gave me a cleaner result (although in my case I'm examining transcriptional change with the addition of certain chemicals-something align/integrate seemed to muddle somewhat).

Here's what I ran with some personal inputs expunged, you're going to need to find the resolution variables and whatnot yourself. I've still used the first steps as I prefer seurat's filtering features, but I CAN NOT RECOMMEND using the partition onward. Frankly, I've forgotten what most of the changes I made are and why, so take caution when using this.)

# ### Re-dimension reduction for 3D rendering
# 
# if (Dim = "3D") {
#   
#   print ("Running UMAP 3D")
#   
#   seurat <- RunUMAP(object = seurat, reduction = "pca", dims = 1:20, n.components = 3)
#   
#   print("Clustering 3D")
#   
#   seurat <- FindNeighbors(object=seurat, dims=1:20)
#   seurat <- FindClusters(object=seurat, resolution=0.5)
#   seurat[[sprintf("ClusterNames_%.1f_%dPC", 0.5, 20)]] <- Idents(object = seurat)
#   
# }
# 

### Building the necessary parts for a basic cds

# part one, gene annotations

seurat <- seuratobj

library(Seurat)
library(monocle3)
library(htmlwidgets)

seurat <- readRDS("WHEREVER/YOU/KEEP/YOURS")

gene_annotation <- as.data.frame(rownames(seurat@reductions[["pca"]]@feature.loadings), row.names = rownames(seurat@reductions[["pca"]]@feature.loadings))
colnames(gene_annotation) <- "gene_short_name"

# part two, cell information

cell_metadata <- as.data.frame(seurat@assays[["RNA"]]@counts@Dimnames[[2]], row.names = seurat@assays[["RNA"]]@counts@Dimnames[[2]])
colnames(cell_metadata) <- "barcode"

# part three, counts sparse matrix

New_matrix <- seurat@assays[["RNA"]]@counts
New_matrix <- New_matrix[rownames(seurat@reductions[["pca"]]@feature.loadings), ]
expression_matrix <- New_matrix

### Construct the basic cds object

cds_from_seurat <- new_cell_data_set(expression_matrix,
                                     cell_metadata = cell_metadata,
                                     gene_metadata = gene_annotation)

### Construct and assign the made up partition 
###### I DO NOT ADVISE THIS

recreate.partition <- c(rep(1, length(cds_from_seurat@colData@rownames)))
names(recreate.partition) <- cds_from_seurat@colData@rownames
recreate.partition <- as.factor(recreate.partition)

cds_from_seurat@clusters@listData[["UMAP"]][["partitions"]] <- recreate.partition

### Assign the cluster info

list_cluster <- seurat@meta.data[[sprintf("seurat_clusters")]]
#list_cluster <- seurat@meta.data[[sprintf("ClusterNames_%s_%sPC", 0.5, 20)]]
names(list_cluster) <- seurat@assays[["RNA"]]@data@Dimnames[[2]]

cds_from_seurat@clusters@listData[["UMAP"]][["clusters"]] <- list_cluster

### Could be a space-holder, but essentially fills out louvain parameters

cds_from_seurat@clusters@listData[["UMAP"]][["louvain_res"]] <- "NA"

### Assign UMAP coordinate

cds_from_seurat@reducedDims@listData[["UMAP"]] <-seurat@reductions[["umap"]]@cell.embeddings

### Assign feature loading for downstream module analysis

cds_from_seurat@preprocess_aux$gene_loadings <- seurat@reductions[["pca"]]@feature.loadings

### Learn graph, this step usually takes a significant period of time for larger samples

print("Learning graph, which can take a while depends on the sample")

cds_from_seurat <- learn_graph(cds_from_seurat, use_partition = T)

plot_cells(cds_from_seurat)

####Here I chose to save the gene_metadata, rds, etc.`
JoshuaCumming commented 4 years ago

Great, thanks for the help!

Best,

Joshua

hpliner commented 4 years ago

Hello,

This issue appears to be an issue about our new package, Monocle 3. Please post any issues for Monocle 3 to the monocle3 repository at https://github.com/cole-trapnell-lab/monocle3.

Monocle 3 Development team

Kikibarmpa commented 4 years ago

So it would seem that there's a major issue with porting Seurat objects into Monocle, namely that the integration anchor data takes the place of PCA loadings, which causes issues downstream with pseudotime analysis as graphtest only has access to PCA loadings. Therefore I would suggest that you NOT use integration with monocle at this time. You might choose to try the Align function in Monocle which hypothetically serves a similar purpose, or to just cluster in seurat which, in my case gave me a cleaner result (although in my case I'm examining transcriptional change with the addition of certain chemicals-something align/integrate seemed to muddle somewhat).

Here's what I ran with some personal inputs expunged, you're going to need to find the resolution variables and whatnot yourself. I've still used the first steps as I prefer seurat's filtering features, but I CAN NOT RECOMMEND using the partition onward. Frankly, I've forgotten what most of the changes I made are and why, so take caution when using this.)

# ### Re-dimension reduction for 3D rendering
# 
# if (Dim = "3D") {
#   
#   print ("Running UMAP 3D")
#   
#   seurat <- RunUMAP(object = seurat, reduction = "pca", dims = 1:20, n.components = 3)
#   
#   print("Clustering 3D")
#   
#   seurat <- FindNeighbors(object=seurat, dims=1:20)
#   seurat <- FindClusters(object=seurat, resolution=0.5)
#   seurat[[sprintf("ClusterNames_%.1f_%dPC", 0.5, 20)]] <- Idents(object = seurat)
#   
# }
# 

### Building the necessary parts for a basic cds

# part one, gene annotations

seurat <- seuratobj

library(Seurat)
library(monocle3)
library(htmlwidgets)

seurat <- readRDS("WHEREVER/YOU/KEEP/YOURS")

gene_annotation <- as.data.frame(rownames(seurat@reductions[["pca"]]@feature.loadings), row.names = rownames(seurat@reductions[["pca"]]@feature.loadings))
colnames(gene_annotation) <- "gene_short_name"

# part two, cell information

cell_metadata <- as.data.frame(seurat@assays[["RNA"]]@counts@Dimnames[[2]], row.names = seurat@assays[["RNA"]]@counts@Dimnames[[2]])
colnames(cell_metadata) <- "barcode"

# part three, counts sparse matrix

New_matrix <- seurat@assays[["RNA"]]@counts
New_matrix <- New_matrix[rownames(seurat@reductions[["pca"]]@feature.loadings), ]
expression_matrix <- New_matrix

### Construct the basic cds object

cds_from_seurat <- new_cell_data_set(expression_matrix,
                                     cell_metadata = cell_metadata,
                                     gene_metadata = gene_annotation)

### Construct and assign the made up partition 
###### I DO NOT ADVISE THIS

recreate.partition <- c(rep(1, length(cds_from_seurat@colData@rownames)))
names(recreate.partition) <- cds_from_seurat@colData@rownames
recreate.partition <- as.factor(recreate.partition)

cds_from_seurat@clusters@listData[["UMAP"]][["partitions"]] <- recreate.partition

### Assign the cluster info

list_cluster <- seurat@meta.data[[sprintf("seurat_clusters")]]
#list_cluster <- seurat@meta.data[[sprintf("ClusterNames_%s_%sPC", 0.5, 20)]]
names(list_cluster) <- seurat@assays[["RNA"]]@data@Dimnames[[2]]

cds_from_seurat@clusters@listData[["UMAP"]][["clusters"]] <- list_cluster

### Could be a space-holder, but essentially fills out louvain parameters

cds_from_seurat@clusters@listData[["UMAP"]][["louvain_res"]] <- "NA"

### Assign UMAP coordinate

cds_from_seurat@reducedDims@listData[["UMAP"]] <-seurat@reductions[["umap"]]@cell.embeddings

### Assign feature loading for downstream module analysis

cds_from_seurat@preprocess_aux$gene_loadings <- seurat@reductions[["pca"]]@feature.loadings

### Learn graph, this step usually takes a significant period of time for larger samples

print("Learning graph, which can take a while depends on the sample")

cds_from_seurat <- learn_graph(cds_from_seurat, use_partition = T)

plot_cells(cds_from_seurat)

####Here I chose to save the gene_metadata, rds, etc.`

Hello, I followed your script and I still get the error: Error: No dimensionality reduction for UMAP calculated. Please run reduce_dimensions with reduction_method = UMAP and cluster_cells before running learn_graph.

Any idea how to solve this? It seems like it doesn't recognize the dimensionality reduction from Seurat in order to create trajectories..or this is what I understand.

Thank you in advance!

tlusardi commented 4 years ago

This has worked for me - good luck!

#### Create a Monocle CDS Object
    # Project PC dimensions to whole data set
    my.so <- ProjectDim(my.so, reduction = "pca")

    # Create an expression matrix
    expression_matrix <- my.so@assays$RNA@counts

    # Get cell metadata
    cell_metadata <- my.so@meta.data
    if (all.equal(colnames(expression_matrix), rownames(cell_metadata))) {
      print(sprintf("Cell identifiers match"))
    } else {
      print(sprintf("Cell identifier mismatch - %i cells in expression matrix, %i cells in metadata",
                    ncol(expression_matrix), nrow(cell_metadata)))
      print("If the counts are equal, sort differences will throw this error")
    }

    # get gene annotations
    gene_annotation <- data.frame(gene_short_name = rownames(my.so@assays$RNA), row.names = rownames(my.so@assays$RNA))
    if (all.equal(rownames(expression_matrix), rownames(gene_annotation))) {
      print(sprintf("Gene identifiers all match"))
    } else {
      print(sprintf("Gene identifier mismatch - %i genes in expression matrix, %i gene in gene annotation",
                    nrow(expression_matrix), nrow(gene_annotation)))
      print("If the counts are equal, sort differences will throw this error")
    }

    # Seurat-derived CDS
    my.cds <- new_cell_data_set(expression_matrix,
                                cell_metadata = cell_metadata,
                                gene_metadata = gene_annotation)

    # Transfer Seurat embeddings
    # Note that these may be calculated on the Integrated object, not the counts
    #   and thus will involve fewer genes
    reducedDim(my.cds, type = "PCA") <- my.so@reductions$pca@cell.embeddings 
    my.cds@preprocess_aux$prop_var_expl <- my.so@reductions$pca@stdev
    plot_pc_variance_explained(my.cds)

    # Transfer Seurat UMAP embeddings
    my.cds@int_colData@listData$reducedDims$UMAP <- my.so@reductions$umap@cell.embeddings
#    plot_cells(my.cds)

    # Copy cluster info from Seurat
    my.cds@clusters$UMAP_so$clusters <- my.so@meta.data$gt_tp_cell_type_integrated_.0.9

    my.cds <- cluster_cells(my.cds, reduction_method = "UMAP", resolution = 1e-3)

    # Fix from https://gitmemory.com/cole-trapnell-lab
    rownames(my.cds@principal_graph_aux$UMAP$dp_mst) <- NULL
    colnames(my.cds@int_colData@listData$reducedDims$UMAP) <- NULL

    DimPlot(so, reduction = "umap")
    DimPlot(my.so, reduction = "umap")
    plot_cells(my.cds, color_cells_by = "partition", group_label_size = 3.5)
    plot_cells(my.cds, color_cells_by = cellids, show_trajectory_graph = FALSE, group_label_size = 3.5)
tlusardi commented 4 years ago

Added a row/column name clean up so that you can run pseudo time... Apologies - it was strung across two scripts.