cole-trapnell-lab / monocle3

Other
324 stars 101 forks source link

Different results when using violin plots and dot plots #665

Open Noeperron opened 1 year ago

Noeperron commented 1 year ago

Hello,

I get very confusing results when visualizing gene expression in my integrated scRNA-seq dataset: violin plots and dot plots describe opposite expression patterns.

When looking at the expression profile of a set of genes across different clusters using plot_genes_violin (see violin_plots.png, attached below), violin plots show higher expression in clusters 1, 3, 6, 10, 15, and 17. The code I used to generate the violin plot for this gene can be found below.

plot_genes_violin(cds_subset, normalize = TRUE, log_scale = TRUE, group_cells_by = "cluster", ncol = 1, pseudocount = 0) + theme(axis.text.x = element_text(angle = 45, hjust = 1))

However, when plotting the same genes using plot_genes_by_group to generate a dot plot, the expression pattern is very different (see attached dot_plot.png below).

The code used was as follows plot_genes_by_group(cds, markers, reduction_method="UMAP", group_cells_by="cluster", pseudocount=1, norm_method="log", ordering_type="none", max.size=10)

I suspect that the normalization performed in plot_genes_violin is responsible for this discrepancy, but as there isn't much documentation available for this function, I can't be sure that's the case here.

Curiously, the "plot_cells" function gives expression profiles similar to those observed in violin diagrams for the same set of genes.

plot_cells(cds, genes = geneID, labels_per_group = F, cell_size = 0.8, scale_to_range = F, norm_method = "log")

Closer examination of the data revealed that average gene expression per group was much lower in groups 1, 3, 6, 10, 15 and 17 than in the rest of the groups (see barplot.png below). These are the same groups in which genes appear to be more highly expressed when examined in violin diagrams.

I'm very confused by these very contrasting results, and would like to understand why the "plot_genes_by_group" function generates different results from those displayed by "plot_genes_violin" and "plot_cells". Because of the parameters chosen, I assumed that the data should be normalized in each of these plots, and yet the results generated are very different.

How can this be explained? Is the normalization process different between these functions? I'd really appreciate it if you could help me understand this problem, which has given me a lot of trouble so far.

Thank you in advance for your time and help,

Sincerely,

Noé violin_plots.png dot_plot.png barplot.png

brgew commented 11 months ago

Hi,

Thank you for the feedback. This is interesting to me.

I found that plot_genes_violin() uses ggplot2 to make the violin plots and when the data are log-transformed ggplot2 drops the cells with zero counts, because log(0) is -Inf, and it calculates the mean using the log-transformed count values. I suspect that this explains at least some of the differences that you see. It may make more sense to plot the median in the violin plots...

Does this address your concerns adequately?

Thank you again.

Best Wishes, Brent