cole-trapnell-lab / monocle-release

276 stars 116 forks source link

plot_pseudotime_heatmap not working in Monocle3 #295

Open nesetozel opened 5 years ago

nesetozel commented 5 years ago

Hello,

I generated a UMAP trajectory in Monocle3 and am able to plot any genes of interest across pseudotime no problem with plot_cell_trajectory. However, we would like to be able to discover new genes that are specifically expressed at certain parts of the trajectory so we really need to be able to cluster the genes. Trying to run plot_pseudotime_heatmap as described in the Monocle2 tutorial results in the following error with Monocle3:

Error in checkForRemoteErrors(val) : 3 nodes produced errors; first error: trying to get slot "family" from an object of a basic class ("logical") with no slots

The problem might have the same origin as these issues (both still open) that were reported before for other functions: https://github.com/cole-trapnell-lab/monocle-release/issues/256 https://github.com/cole-trapnell-lab/monocle-release/issues/239

I would greatly appreciate if there is any way to circumvent this problem. I can't simply use Monocle2 unfortunately since DDRTree method doesn't produce proper trajectories in my dataset, but UMAP does...

On a related point, we are also interested in discovering branch specific genes (w/ plot_genes_branched_heatmap function) but while the UMAP trajectories have branches, the branch point IDs are not labeled on the graph as it was in Monocle2 so I have no idea what I would pass for the 'branch_point' argument or if this function even works with UMAP trajectories yet...

Thank you!

dwucsf commented 5 years ago

Hi @nesetozel, Yup, I troubleshot this for about a month. Finally got it to work. I was getting the same error, and I realized the problem was that I was using my entire CDS as the input instead of subsetting genes first and then feeding that into the CDS option. Here is my code.

diff_test_result <- differentialGeneTest(cds, fullModelFormulaStr = "~sm.ns(Pseudotime, df=3)", reducedModelFormulaStr = "~1", relative_expr = TRUE, cores = 4, verbose = FALSE)

sig_gene_names <- row.names(subset(diff_test_result, qval < 0.00001)) nrow(subset(diff_test_result, qval < 0.00001)) write.csv(diff_test_result, file=paste("diff_test_result.csv",sep=""))

heatmap <- plot_pseudotime_heatmap(cds[sig_gene_names,], num_clusters = 4, cores = 6, show_rownames = FALSE, return_heatmap = T)

I hope this helps!

I'm at the point now where I've run into your second problem. I have two branches in my trajectory, but there are no branch IDs to use to input into the plot_genes_branched_heatmap function. Let me know if you figure this part out.

nesetozel commented 5 years ago

Thanks for the reply! My issue wasn't due to this, since I was already doing the gene subsetting (by the way, I would suggest you use the new principalGraphTest() function to calculate DE genes, supposed to be better). But I was also able to work around by loading the VGAM library explicitly. It appears to be a dependency they forgot to include.

No progress on the branch analysis. I believe the concept of branch points is simply not yet implemented for the UMAP trajectories...

jeanettejohnson commented 5 years ago

So I think this may be because the function is looking for a slot called "family" that's now called "expressionFamily." haven't found a way around it yet...

jeanettejohnson commented 5 years ago

So I think this may be because the function is looking for a slot called "family" that's now called "expressionFamily." haven't found a way around it yet...

Nope, that wasn't it. I was able to fix this issue by providing an explicit list of genes and subsetting based on that, as so:

my_genes <- row.names(subset(fData(monocytes), gene_short_name %in% c("Wfdc17", "Ccr2", "Pglyrp1", "F13a1", "Ly6c2", "Lyz2", "Malat1", "Fn1", "Eno3", "Capg", "Cst3", "Vim", "Ms4a4c", "Cd3g", "S100a10", "Xcl1")))

mos_subset <- monocytes[my_genes,]

heatmap <- plot_pseudotime_heatmap(mos_subset, num_clusters = 1, show_rownames = FALSE, return_heatmap = T)

I also plotted the genes in pseudotime before doing this, if that helps.

e1992 commented 4 years ago

Would anyone mind sharing code showing how to use the monocle 2 tool plot_pseudotime_heatmap with a cds generate with monocle 3? I'm completely stumped and would greatly appreciate it!

jeanettejohnson commented 4 years ago

Would anyone mind sharing code showing how to use the monocle 2 tool plot_pseudotime_heatmap with a cds generate with monocle 3? I'm completely stumped and would greatly appreciate it!

Here's my code (monocle 3). Hope this helps!

my_genes <- row.names(subset(fData(cds),
          gene_short_name %in% c("gene1", "gene2", "gene3"))
cds_subset <- cds[my_genes,]
heatmap <- plot_pseudotime_heatmap(cds, cluster_rows = FALSE, show_rownames = TRUE, return_heatmap = T)
heatmap
edavis71 commented 4 years ago

Would anyone mind sharing code showing how to use the monocle 2 tool plot_pseudotime_heatmap with a cds generate with monocle 3? I'm completely stumped and would greatly appreciate it!

Here's my code (monocle 3). Hope this helps!

my_genes <- row.names(subset(fData(cds),
          gene_short_name %in% c("gene1", "gene2", "gene3"))
cds_subset <- cds[my_genes,]
heatmap <- plot_pseudotime_heatmap(cds, cluster_rows = FALSE, show_rownames = TRUE, return_heatmap = T)
heatmap

Is this code for monocle 3alpha or the most current monocle3? The plot_pseudotime_heatmap isn't available in the newest version of monocle and if I try to use the previous version to get around this I get the following error since the cds object has been updated:


  unable to find an inherited method for function ‘pData’ for signature ‘"cell_data_set"’```
jonhsussman commented 3 years ago

After realizing that monocle3 no longer supports this function, I adapted a solution based on the code published by Tambalo et al., (A single cell transcriptome atlas of the developing zebrafish hindbrain). The example below shows clustering by both k means as well as hierarchical clustering. I hope this helps!

library(ComplexHeatmap)
library(ggplot2)
library(dplyr)
library(RColorBrewer)
library(circlize)
library(monocle3)

modulated_genes <- graph_test(cds, neighbor_graph = "principal_graph", cores = 4)
genes <- row.names(subset(modulated_genes, q_value == 0 & morans_I > 0.25))
genes

pt.matrix <- exprs(cds)[match(genes,rownames(rowData(cds))),order(pseudotime(cds))]
#Can also use "normalized_counts" instead of "exprs" to use various normalization methods, for example:
#normalized_counts(cds, norm_method = "log")

pt.matrix <- t(apply(pt.matrix,1,function(x){smooth.spline(x,df=3)$y}))
pt.matrix <- t(apply(pt.matrix,1,function(x){(x-mean(x))/sd(x)}))
rownames(pt.matrix) <- genes;
#K means with 6 groups
htkm <- Heatmap(
  pt.matrix,
  name                         = "z-score",
  col                          = colorRamp2(seq(from=-2,to=2,length=11),rev(brewer.pal(11, "Spectral"))),
  show_row_names               = TRUE,
  show_column_names            = FALSE,
  row_names_gp                 = gpar(fontsize = 6),
  km = 6,
  row_title_rot                = 0,
  cluster_rows                 = TRUE,
  cluster_row_slices           = FALSE,
  cluster_columns              = FALSE)

#Ward.D2 Hierarchical Clustering
hthc <- Heatmap(
  pt.matrix,
  name                         = "z-score",
  col                          = colorRamp2(seq(from=-2,to=2,length=11),rev(brewer.pal(11, "Spectral"))),
  show_row_names               = TRUE,
  show_column_names            = FALSE,
  row_names_gp                 = gpar(fontsize = 6),
  clustering_method_rows = "ward.D2",
  clustering_method_columns = "ward.D2",
  row_title_rot                = 0,
  cluster_rows                 = TRUE,
  cluster_row_slices           = FALSE,
  cluster_columns              = FALSE)

print(htkm)
print(hthc)
areyoukidneyme commented 2 years ago

This code works great! However, I do want to understand the step-by-step process. Not an expert coder, so I'd really appreciate if someone can explain what each line of code does especially these parts:

pt.matrix <- exprs(cds)[match(genes,rownames(rowData(cds))),order(pseudotime(cds))]
#Can also use "normalized_counts" instead of "exprs" to use various normalization methods, for example:
#normalized_counts(cds, norm_method = "log")

pt.matrix <- t(apply(pt.matrix,1,function(x){smooth.spline(x,df=3)$y}))
pt.matrix <- t(apply(pt.matrix,1,function(x){(x-mean(x))/sd(x)}))

Thanks in advance!

jonhsussman commented 2 years ago

I'm glad to hear my code was helpful!

In the first line, it is simply taking the list of expression level of from each cell and placing them in order along the pseudotime trajectory. These are the counts that are normalized by monocle, hence the difference options for normalization methods.

Then, for each gene, it fits a spline curve with 3 degrees of freedom (the main parameter there that can be adjusted). This essentially just smooths out he points in the same way that you see a smooth line going through the pseudotime trajectory plots for a single gene. Then finally, it normalized the values, generating a z-score value for each gene, in order to compare different genes along the same scale.

areyoukidneyme commented 2 years ago

I'm glad to hear my code was helpful!

In the first line, it is simply taking the list of expression level of from each cell and placing them in order along the pseudotime trajectory. These are the counts that are normalized by monocle, hence the difference options for normalization methods.

Then, for each gene, it fits a spline curve with 3 degrees of freedom (the main parameter there that can be adjusted). This essentially just smooths out he points in the same way that you see a smooth line going through the pseudotime trajectory plots for a single gene. Then finally, it normalized the values, generating a z-score value for each gene, in order to compare different genes along the same scale.

That's awesome!

I have another question if I make 2 separate heatmaps for 2 separate datasets (say 1 for control and 1 treatment), will the X-axis be comparable? I guess I should just also ask what the X-axis in the heatmap is.

Thanks!

jonhsussman commented 2 years ago

This is a great question @areyoukidneyme: it's hard to really "compare" the x-axis to any concrete metric in any case, so if there is a global difference in a set of genes between control and treatment, this visualization should be able to represent the difference clearly regardless of whether the units are truly comparable. But more specifically, in this case, the X-axis is really just a cell number count that goes from cell 1 to cell n in your dataset linearly (ordered by pseudotime). If you want to be a little more clever, you can space out the cells by their relationship in pseudotime rather than just cell by cell (as this code does).

In that case, you would replace pt.matrix <- t(apply(pt.matrix,1,function(x){smooth.spline(x,df=3)$y})) with the following:

pseudos <- sort(t(as.matrix(pseudotime(cds))))
pt.matrix <- t(apply(pt.matrix,1,function(x){smooth.spline(pseudos,y=x,df=3)$y}))

However, I find that the initial approach looks a little cleaner (although perhaps you might like the latter approach better). And in either case, the start and end points are the same, it is only the spacing that is different. So overall, this graph is really just a relative representation from start to end.

areyoukidneyme commented 2 years ago

Great! Thanks @jonhsussman ! Appreciate your help!

jonhsussman commented 2 years ago

No problem!

On Thu, Sep 15, 2022 at 9:05 PM NephBro @.***> wrote:

Great! Thanks @jonhsussman https://urldefense.com/v3/__https://github.com/jonhsussman__;!!LIr3w8kk_Xxm!pHPaVqv2eQ_KqMMn7r7L9kL8Rcdy_mpJoT7udFIkv3NTrPWVsO8AvbjMDfzv-xliJqO2GbvbQG97MDRuR18nXAmy$ ! Appreciate your help!

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/cole-trapnell-lab/monocle-release/issues/295*issuecomment-1248801197__;Iw!!LIr3w8kk_Xxm!pHPaVqv2eQ_KqMMn7r7L9kL8Rcdy_mpJoT7udFIkv3NTrPWVsO8AvbjMDfzv-xliJqO2GbvbQG97MDRuRzbxOd4c$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AKIGYMSH45CMOTCH7I2MPI3V6PBVJANCNFSM4HDVFTIQ__;!!LIr3w8kk_Xxm!pHPaVqv2eQ_KqMMn7r7L9kL8Rcdy_mpJoT7udFIkv3NTrPWVsO8AvbjMDfzv-xliJqO2GbvbQG97MDRuR9SV58S_$ . You are receiving this because you were mentioned.Message ID: @.***>

YangHeeDong commented 1 year ago

@jonhsussman hello I am a student studying bioinformatics. I have a question about the code written by jonhsussman.

genes <- row.names(subset(modulated_genes, q_value == 0 & morans_I > 0.25))

By what criteria did you cut off?

jonhsussman commented 1 year ago

Great to see fellow student (I am studying computational biology)

This is a good question, and it depends heavily on the specific data that you are working with. I picked these values fairly arbitrarily after looking at the results and picked a value that would capture the first 50-100 genes that are most significantly variable across pseudotime. It entirely depends on the dataset that you are working with, so I suggest trying different values such that you get enough genes to visualize distinct clusters (representing different gene modules) but without it becoming excessively dense

On Sun, Sep 18, 2022 at 9:03 PM hdYang @.***> wrote:

@jonhsussman https://urldefense.com/v3/__https://github.com/jonhsussman__;!!LIr3w8kk_Xxm!p-75YZSRYk9Du_Q2sdxmtkcLTje5XuAvOWsdeCf0d3q9ZL82KifjCoFZWz2KulebXPvfGei62uX7VRxlmHD1OQ6A$ hello I am a student studying bioinformatics. I have a question about the code written by jonhsussman.

genes <- row.names(subset(modulated_genes, q_value == 0 & morans_I > 0.25))

By what criteria did you cut off?

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/cole-trapnell-lab/monocle-release/issues/295*issuecomment-1250435621__;Iw!!LIr3w8kk_Xxm!p-75YZSRYk9Du_Q2sdxmtkcLTje5XuAvOWsdeCf0d3q9ZL82KifjCoFZWz2KulebXPvfGei62uX7VRxlmPLGkCP6$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AKIGYMXGXKPL2KLWUFVLMFTV663U3ANCNFSM4HDVFTIQ__;!!LIr3w8kk_Xxm!p-75YZSRYk9Du_Q2sdxmtkcLTje5XuAvOWsdeCf0d3q9ZL82KifjCoFZWz2KulebXPvfGei62uX7VRxlmNoYVgJQ$ . You are receiving this because you were mentioned.Message ID: @.***>

areyoukidneyme commented 1 year ago

Hello,

I'm trying to create a similar heatmap but using values from AddModuleScore in Seurat instead of a set of genes like what we do here. ` pt.matrix <- exprs(cds)[match(genes,rownames(rowData(cds))),order(pseudotime(cds))] `

Can anyone help me extracting the module score information and use it to create a matrix for this:

` pt.matrix <- t(apply(pt.matrix,1,function(x){smooth.spline(x,df=3)$y})) pt.matrix <- t(apply(pt.matrix,1,function(x){(x-mean(x))/sd(x)})) `

Currently, the Module score is stored in the cds (monocle) file under colData > listData > score1

Thanks in advance!

sevnnnick commented 1 year ago

Would anyone mind sharing code showing how to use the monocle 2 tool plot_pseudotime_heatmap with a cds generate with monocle 3? I'm completely stumped and would greatly appreciate it!

Here's my code (monocle 3). Hope this helps!

my_genes <- row.names(subset(fData(cds),
          gene_short_name %in% c("gene1", "gene2", "gene3"))
cds_subset <- cds[my_genes,]
heatmap <- plot_pseudotime_heatmap(cds, cluster_rows = FALSE, show_rownames = TRUE, return_heatmap = T)
heatmap

Is this code for monocle 3alpha or the most current monocle3? The plot_pseudotime_heatmap isn't available in the newest version of monocle and if I try to use the previous version to get around this I get the following error since the cds object has been updated:

  unable to find an inherited method for function ‘pData’ for signature ‘"cell_data_set"’```

it's a function of monocle2,not monocle 3.I'm confused it too

sevnnnick commented 1 year ago

This is a great question @areyoukidneyme: it's hard to really "compare" the x-axis to any concrete metric in any case, so if there is a global difference in a set of genes between control and treatment, this visualization should be able to represent the difference clearly regardless of whether the units are truly comparable. But more specifically, in this case, the X-axis is really just a cell number count that goes from cell 1 to cell n in your dataset linearly (ordered by pseudotime). If you want to be a little more clever, you can space out the cells by their relationship in pseudotime rather than just cell by cell (as this code does).

In that case, you would replace pt.matrix <- t(apply(pt.matrix,1,function(x){smooth.spline(x,df=3)$y})) with the following:

pseudos <- sort(t(as.matrix(pseudotime(cds))))
pt.matrix <- t(apply(pt.matrix,1,function(x){smooth.spline(pseudos,y=x,df=3)$y}))

However, I find that the initial approach looks a little cleaner (although perhaps you might like the latter approach better). And in either case, the start and end points are the same, it is only the spacing that is different. So overall, this graph is really just a relative representation from start to end.

thank you for your code.But how to show the celltype information at the top of the heatmat?

ruchikabhat commented 1 year ago

Hi all,

I am using pseudotime analysis for the first time for mouse scRNA data and have encountered the following error:

FibroMono <- orderCells(FibroMono) Warning messages: 1: In graph.dfs(dp_mst, root = root_cell, neimode = "all", unreachable = FALSE, : Argument neimode' is deprecated; use mode' instead 2: In graph.dfs(dp_mst, root = root_cell, neimode = "all", unreachable = FALSE, : Argument neimode' is deprecated; use mode' instead

plot_cell_trajectory(FibroMono,

                 color_by = "Clusters",
                 theta = -15,
                 show_branch_points = TRUE,show_backbone = TRUE,
                 backbone_color = "black",
                 show_tree = TRUE,use_color_gradient = FALSE,
                 cell_size = 1.5, values=a) + scale_color_manual(values = a, name = "Only FRCs Pseudotime")+ theme(legend.position = "top")

Error in .standalone_types_check_dot_call(ffi_standalone_check_number_1.0.7, : object 'ffi_standalone_check_number_1.0.7' not found

Please help!

Thank you in advance.

shanshancode commented 1 year ago

Hello and thanks for the code @jonhsussman ,

my question is if replace genes with a mygeneset defined by me, [the genes in the mygeneset are all in the rownames(rowData(cds))], why I get the erro:

> genes <- row.names(subset(modulated_genes, q_value == 0 & morans_I > 0.25)) > genes<-intersect(rownames(rowData(cds.cluster)) ,mygeneset)

pt.matrix looks normal, but when run

htkm <- Heatmap(....

I get:

Error in hclust(get_dist(submat, distance), method = method) : NA/NaN/Inf in foreign function call (arg 10)

Best, Shanshan

jonhsussman commented 1 year ago

Hi @shanshancode, could you see if there are NA values in the pt.matrix, or try removing them with pt.matrix <- na.omit(pt.matrix), I wonder if this may be the problem here.

BhavishyaNel commented 10 months ago

After realizing that monocle3 no longer supports this function, I adapted a solution based on the code published by Tambalo et al., (A single cell transcriptome atlas of the developing zebrafish hindbrain). The example below shows clustering by both k means as well as hierarchical clustering. I hope this helps!

library(ComplexHeatmap)
library(ggplot2)
library(dplyr)
library(RColorBrewer)
library(circlize)
library(monocle3)

modulated_genes <- graph_test(cds, neighbor_graph = "principal_graph", cores = 4)
genes <- row.names(subset(modulated_genes, q_value == 0 & morans_I > 0.25))
genes

pt.matrix <- exprs(cds)[match(genes,rownames(rowData(cds))),order(pseudotime(cds))]
#Can also use "normalized_counts" instead of "exprs" to use various normalization methods, for example:
#normalized_counts(cds, norm_method = "log")

pt.matrix <- t(apply(pt.matrix,1,function(x){smooth.spline(x,df=3)$y}))
pt.matrix <- t(apply(pt.matrix,1,function(x){(x-mean(x))/sd(x)}))
rownames(pt.matrix) <- genes;
#K means with 6 groups
htkm <- Heatmap(
  pt.matrix,
  name                         = "z-score",
  col                          = colorRamp2(seq(from=-2,to=2,length=11),rev(brewer.pal(11, "Spectral"))),
  show_row_names               = TRUE,
  show_column_names            = FALSE,
  row_names_gp                 = gpar(fontsize = 6),
  km = 6,
  row_title_rot                = 0,
  cluster_rows                 = TRUE,
  cluster_row_slices           = FALSE,
  cluster_columns              = FALSE)

#Ward.D2 Hierarchical Clustering
hthc <- Heatmap(
  pt.matrix,
  name                         = "z-score",
  col                          = colorRamp2(seq(from=-2,to=2,length=11),rev(brewer.pal(11, "Spectral"))),
  show_row_names               = TRUE,
  show_column_names            = FALSE,
  row_names_gp                 = gpar(fontsize = 6),
  clustering_method_rows = "ward.D2",
  clustering_method_columns = "ward.D2",
  row_title_rot                = 0,
  cluster_rows                 = TRUE,
  cluster_row_slices           = FALSE,
  cluster_columns              = FALSE)

print(htkm)
print(hthc)

Hi, thank you so much for the code! It is incredibly helpful :) @jonhsussman Is it possible to have the pseudotime intervals on the x-axis of the heatmap? maybe just 0-5, 5-10, 10-15, 15-20? It is just very hard to interpret the points where the gene expression is shifting among different time points if no scale is given. I tried modifying the code myself to somehow make it work but I'm running through a lot of issues. Looking forward to your reply!

PS: I used the scaled values(cds@assays@data$scaledata) instead of exprs(cds) to get more accurate results as the single cell analsis is usually performed on scaled data and not the raw data.

RichardBJ commented 2 months ago

Really helpful script thanks @jonhsussman. I realise we are about a year on, but I worried about the naming of the rows? In my case, where rownames of pt.matrix were geneIDs I renamed them with my dictionary rownames(pt.matrix) <- names(gene_dict)[which(gene_dict %in% rownames(pt.matrix))] rather than rownames(pt.matrix) <- genes

Genes appear in a different order now...