carmonalab / UCell

Gene set scoring for single-cell data
GNU General Public License v3.0
132 stars 16 forks source link

Seurat dotplot using UCell indicates everything is 100% expressed? #32

Closed jcshuy closed 1 year ago

jcshuy commented 1 year ago

Hi everyone, I'm somewhat new to using Seurat and recently tried to use UCell to calculate gene scores for marker groups between subclusters, so I'm sorry if this issue is due to my inexperience. I successfully added the module scores to my Seurat file following the format here and wanted to run a dotplot comparing several of them between the subclusters. However I noticed that for some reason the plot displays them all as 100% expressed.

DotPlot(seurObj, features = c(names(c(DAM, Aging, Tau_p301s))))

Screenshot 2023-09-01 at 2 09 16 PM

But when it checks for each gene individually within a list the expressions appear as such: DotPlot(seurObj, features = DAM )

Screenshot 2023-09-01 at 2 11 43 PM

Is there something I'm doing wrong? My code looks something like this:

library(UCell) 
library(Seurat)
library(readr)
newmarks <- read_csv("newdata.csv") # A large CSV containing gene markers in each row
newmarks <- as.data.frame(newmarks)
DAM <- as.data.frame(newmarks[,2]) %>% na.omit() # 161 items
aging <- as.data.frame(newmarks[,5]) %>% na.omit() # 382 items
Tau_p301s <- list(Tau_p301s = as.vector(taup301)) # 209 items

seurObj <- AddModuleScore_UCell(seurObj, DAM, BPPARAM = BPPARAM, name = NULL)
seurObj <- AddModuleScore_UCell(seurObj, Aging, BPPARAM = BPPARAM, name = NULL)
seurObj <- AddModuleScore_UCell(seurObj, taup301s, BPPARAM = BPPARAM, name = NULL)

DotPlot(seurObj, features = c(names(c(DAM, Aging, Tau_p301s))))

Sorry I could not show more data, I'm not allowed to share too much due to workplace regulation. Thank you for your help!

mass-a commented 1 year ago

Hello, dotplots are good for visualizing expression of individual genes, for which you often have a considerable fraction of zero values, but not for visualizing signatures scores. Especially if your signatures are composed of many genes, chances are that no UCell scores are exactly = 0, therefore all dot sizes are 100%. I would recommend using other kinds of plots for UCell scores, e.g. box plots or violin plots, which allow visualizing the actual distribution of scores instead of the average and fraction of non-zero values.

jcshuy commented 1 year ago

Thank you for your feedback! That was what I was assuming, but I just wanted to make sure it wasn't an error. I agree with the notion for violin plots and will look into using them to better display the data. Thank you again for your help!