carmonalab / UCell

Gene set scoring for single-cell data
GNU General Public License v3.0
132 stars 16 forks source link

smoothKNN function with sce object #36

Closed cstrlln closed 6 months ago

cstrlln commented 6 months ago

I'm trying to use the SmoothKNN function with an SCE object that contains the UCell in the altExp slot, not sure how to properly compose the function in terms of sce.assay and sce.expname. Below is the code and the error I get.

`c5_bp_gene_sets = msigdbr(species = "human", category = "C5", subcategory = "GO:BP") head(c5_bp_gene_sets)

c5_bp_gene_sets_list = split(x = c5_bp_gene_sets $gene_symbol, f = c5_bp_gene_sets $gs_name)

set.seed(0101001001) ranks <- UCell::StoreRankings_UCell(scores, assay = 'logcounts', maxRank = 2000,ncores = 4)

scores2 <- UCell::ScoreSignatures_UCell(scores, features=c5_bp_gene_sets_list , precalc.ranks = ranks, ncores = 5, assay = 'logcounts')

scores2_smooth <- UCell::SmoothKNN(scores2, signature.names = names(c5_bp_gene_sets_list), reduction="PCA", sce.expname = c("UCell"))

` Error in SmoothKNN.SingleCellExperiment(scores2, signature.names = names(c5_bp_gene_sets_list), : Could not find any of the given signatures in this object

mass-a commented 6 months ago

Hello!

please note that ScoreSignatures_UCell() by default adds a suffix to the signature names to identify the UCell gene set scores (name parameter). You can set this parameter to NULL, if you want to prevent adding the suffix. In your code, it should be sufficient to modify the call as follows:

scores2 <- UCell::ScoreSignatures_UCell(scores,
    features=c5_bp_gene_sets_list ,
    precalc.ranks = ranks,
    ncores = 5,
    assay = 'logcounts',
    name = NULL)

In the same way, SmoothKNN will add the "_kNN" suffix to the knn-smoothed signatures to identify the smoothed scores - keep that in mind when retrieving the signatures from the resulting object. You can refer to Using UCell with SingleCellExperiment for some examples.

Best -m

cstrlln commented 6 months ago

Ah, the suffix, missed that. Thanks!

Could you still give an example on how to use sce.assay and sce.expname properly?

mass-a commented 6 months ago

Yes, I'll take the example that comes up with ?SmoothKNN:

library(UCell)
library(SingleCellExperiment)
library(scater)

data(sample.matrix)
sce <- SingleCellExperiment(list(counts=sample.matrix))
gene.sets <- list( Tcell = c("CD2","CD3E","CD3D"),
                   Myeloid = c("SPI1","FCER1G","CSF1R"))
# Calculate UCell scores
sce <- ScoreSignatures_UCell(sce, features=gene.sets, name=NULL)
# Run PCA
sce <- logNormCounts(sce)
sce <- runPCA(sce, scale=TRUE, ncomponents=20)
# Smooth signatures
sce <- SmoothKNN(sce, reduction="PCA", signature.names=names(gene.sets), sce.expname = "UCell")
sce
class: SingleCellExperiment 
dim: 20729 600 
metadata(0):
assays(2): counts logcounts
rownames(20729): AL627309.1 AL669831.5 ... AP001468.1 AP001469.2
rowData names(0):
colnames(600): L5_ATTTCTGAGGTCGTGA L4_TCACTATTCATCTCTA ... E2L4_TCGGGCACAAGTCCAT E2L8_CGTCCATCACCACATA
colData names(1): sizeFactor
reducedDimNames(1): PCA
mainExpName: NULL
altExpNames(2): UCell UCell_kNN

There are two altExp: UCell and UCell_kNN, generated respectively by signature scoring and subsequent smoothing of the signatures. What if, instead, you wanted to smooth directly the gene log-counts from the 'main' experiment? You could do:

sce <- SmoothKNN(sce, reduction="PCA", signature.names=c("CD4","CD8A"),
         sce.expname = "main", sce.assay = "logcounts")
sce
class: SingleCellExperiment 
dim: 20729 600 
metadata(0):
assays(2): counts logcounts
rownames(20729): AL627309.1 AL669831.5 ... AP001468.1 AP001469.2
rowData names(0):
colnames(600): L5_ATTTCTGAGGTCGTGA L4_TCACTATTCATCTCTA ... E2L4_TCGGGCACAAGTCCAT E2L8_CGTCCATCACCACATA
colData names(1): sizeFactor
reducedDimNames(1): PCA
mainExpName: NULL
altExpNames(3): UCell UCell_kNN main_kNN

Note that this command generated a new altExp ('main_kNN') with the smoothed gene log-counts. This can then be accessed by altExp(sce, "main_kNN").

I hope that's useful.

cstrlln commented 6 months ago

This makes it clear, thanks!