cole-trapnell-lab / monocle3

Other
327 stars 101 forks source link

plot_genes_in_pseudotime plotting expectation values #602

Open kpatel427 opened 2 years ago

kpatel427 commented 2 years ago

Hello,

I performed trajectory analysis on a dataset containing human monocytes and its progenitor cells. When I tried to plot genes that changed expression over pseudotime I get a plot that looks like this:

image

I checked counts for this gene and most cells do not zero counts.

Screen Shot 2022-06-20 at 16 03 14

On further inspection, I noticed that when I run the following block of code:

p <- plot_genes_in_pseudotime(mono_lineage_cds,
                         color_cells_by="redefined_cluster",
                         min_expr=0.5)

p$layers

Output:
> p$layers
[[1]]
mapping: colour = ~redefined_cluster 
geom_point: na.rm = FALSE
stat_identity: na.rm = FALSE
position_jitter 

[[2]]
mapping: x = ~pseudotime, y = ~expectation 
geom_line: na.rm = FALSE, orientation = NA
stat_identity: na.rm = FALSE
position_identity 

[[3]]
mapping: y = ~y 
geom_blank: na.rm = FALSE
stat_identity: na.rm = FALSE
position_identity 

It is actually plotting the expectation values instead of expression because when I plot using the minimum value of expectation for min_expr, I get the following plot.

p <- plot_genes_in_pseudotime(mono_lineage_cds,
                         color_cells_by="redefined_cluster",
                         min_expr=0.5)
p1 <- p$data

min(p1$expectation)
# [1] 4.858196e-05

# re-plotting with min expectation value
plot_genes_in_pseudotime(mono_lineage_cds,
                         color_cells_by="redefined_cluster",
                         min_expr=4.858196e-05)
image

I am not sure if this is a bug or if I have grossly misunderstood something here. Any input/help/suggestions are highly appreciated!

Thank you! Khushbu

Ruismart commented 2 years ago

same issue, have to set min_expr=1e-5, if this is a right level for gene expression

Ruismart commented 2 years ago

in the function plot_genes_in_pseudotime() https://rdrr.io/github/cole-trapnell-lab/monocle3/src/R/plotting.R, seems like each cell divide by its size-factor but then forget to multiply by some coef like 1e4:

cds_exprs <- SingleCellExperiment::counts(cds_subset)

cds_exprs <- Matrix::t(Matrix::t(cds_exprs)/size_factors(cds_subset))

cds_exprs <- reshape2::melt(round(as.matrix(cds_exprs)))

williamwuyulu commented 2 years ago

Have the same question, I don't quite understand the meaning of min_expr. It seems that the values in the graph vary depending on how much I set.. ` p <- plot_genes_in_pseudotime(cds_subset, color_cells_by="seurat_clusters", trend_formula = "~ splines::ns(pseudotime, df=3)", min_expr=0.1) p <- p$data

min(p$expectation) [1] 0.1 p <- plot_genes_in_pseudotime(cds_subset, color_cells_by="seurat_clusters", trend_formula = "~ splines::ns(pseudotime, df=3)") p <- p$data min(p$expectation) [1] 1.388794e-12

1.388794e-12

plot_genes_in_pseudotime(cds_subset, color_cells_by="seurat_clusters", trend_formula = "~ splines::ns(pseudotime, df=3)", min_expr=0.5)

1.388794e-12

plot_genes_in_pseudotime(cds_subset, color_cells_by="seurat_clusters", trend_formula = "~ splines::ns(pseudotime, df=3)", min_expr=1.388794e-12)` image image

Ruismart commented 2 years ago

I still have issues in 'expectation' values from fit_models and model_predictions so that I couldn't figure out how to get a confident trend line for a specified gene along the trajectory.

only to show the gene expression, could do it manually like this:
(according to https://rdrr.io/github/cole-trapnell-lab/monocle3/src/R/plotting.R)

monocle_theme_opts <- function(){
  theme(strip.background = element_rect(colour = 'white', fill = 'white')) +
    theme(panel.border = element_blank()) +
    theme(axis.line.x = element_line(size=0.25, color="black")) +
    theme(axis.line.y = element_line(size=0.25, color="black")) +
    theme(panel.grid.minor.x = element_blank(),
          panel.grid.minor.y = element_blank()) +
    theme(panel.grid.major.x = element_blank(),
          panel.grid.major.y = element_blank()) +
    theme(panel.background = element_rect(fill='white')) +
    theme(legend.key=element_blank())

trend_formula="~ splines::ns(pseudotime, df=3)"
NROW=NULL
NCOL=4
cell_size=0.75
horizontal_jitter=NULL
vertical_jitter=NULL

# my Annotation column
color_cells_by = "Anno"
# four genes to test
test_genes <- c("A","B","C","D")

cds_subset <- cds[c(test_genes ),]
cds_subset <- cds_subset[,is.finite(colData(cds_subset)$pseudotime)]

## manually add coef 1e4 to recover the expression level
# cds_exprs <- SingleCellExperiment::counts(cds_subset)
# cds_exprs <- Matrix::t(Matrix::t(cds_exprs)/size_factors(cds_subset))*1e4
## or just use normalized data
cds_exprs <- SingleCellExperiment::logcounts(cds_subset)

colnames(cds_exprs) <- c("f_id","Cell","expression")
cds_colData <- colData(cds_subset)
cds_rowData <- rowData(cds_subset)

cds_exprs <- merge(cds_exprs, cds_rowData, by.x = "f_id", by.y = "row.names")
#cds_exprs <- merge(cds_exprs, cds_colData, by.x = "Cell", by.y = "row.names")
cds_exprs[,c(color_cells_by,"pseudotime")] <- cds_colData[cds_exprs$Cell,c(color_cells_by,"pseudotime")]

#cds_exprs$adjusted_expression <- cds_exprs$expression
cds_exprs$feature_label <- cds_exprs$f_id
cds_exprs$f_id <- as.character(cds_exprs$f_id)
cds_exprs$feature_label <- factor(cds_exprs$feature_label)

# I ignored the fit_models() and model_predictions() steps
#     so no trend line plotted with 'expectation' then

q <- ggplot(aes(pseudotime, log2(expression+1)), data = as.data.frame(cds_exprs)) +
      geom_point(aes_string(color = color_cells_by),
                          size = I(cell_size),
                          position=position_jitter(horizontal_jitter,
                                                                 vertical_jitter)) +
      facet_wrap(~feature_label, nrow = NROW,
                         ncol = NCOL, scales = "free_y") +
      labs(x="pseudotime", y="Expression")  + monocle_theme_opts()

q
cherriyoush commented 2 years ago

when I convert the Seurat object to cds object with SeuratWrapper's function as.cell_data_set(),I met the same problem too,and SeuratWrappers will now add the nCount metadata field as size factors when converting to a cell_data_set object; cds$Size_Factor is nCount , but in monocle3's function new_cell_data_set(), cds$Size_Factor has been calculated, that's Counts/geometric_mean(total_Counts), so I run the following command cds <- as.cell_data_set(seuratObj, assay = "RNA") cds <- estimate_size_factors(cds) before downstream analysis,then the problem solved.

williamwuyulu commented 2 years ago

cds <- estimate_size_factors(cds)

That's really helps!

Thank you so much!

ChristinaSteyn commented 8 months ago

I'm not really sure what I am doing but I was having the same problem. For what it is worth, I tried adjusting the limits on the y axis to be the minimum and maximum of the expression values and it looked better. See below.

p <-plot_genes_in_pseudotime(cds_new, color_cells_by="epoch")

d <- as.data.frame(p$data)

min(d$adjusted_expression)

plot_genes_in_pseudotime(cds_new, color_cells_by="epoch")+ ylim(min(d$adjusted_expression), max(d$adjusted_expression))