discrepancy between addPerCellQC's sum & colSums #28

Closed HelenaLC closed 4 weeks ago

HelenaLC commented 2 months ago

Hi there - apologies in advance if this is very obvious - my collaborators processed our data using Seurat and noticed a discrepancy between their Count_RNA/nFeature_RNA and by sum/detected (meanwhile, % mitochondrial etc. were identical). Any clue as to what might be going on?

> sce
class: SingleCellExperiment 
dim: 15502 39075 
assays(2): counts logcounts
rownames(15502): SAMD11 NOC2L ... MT-ND6 MT-CYB
rowData names(3): gene_id gene_symbol hv
colnames(39075): c_1_1_1 c_1_1_2 ... c_2_2_10415 c_2_2_10416
colData names(35): library barcode ... detected total
reducedDimNames(2): PCA UMAP
mainExpName: NULL

> class(assay(sce))
[1] "dgCMatrix"
[1] "Matrix"

> sce <- addPerCellQC(sce)
> sum <- colSums(assay(sce))
> det <- colSums(assay(sce) > 0)

> identical(sce$sum, sum)
> identical(sce$detected, det)

> head(cbind(scuttle=sce$sum, colSums=sum), 10)
         scuttle colSums
c_1_1_1     5755    5755
c_1_1_2      800     800
c_1_1_3     2953    2953
c_1_1_4      668     668
c_1_1_5     8639    8637
c_1_1_6      952     951
c_1_1_7     6460    6453
c_1_1_8      527     527
c_1_1_9      841     840
c_1_1_10    1000    1000

> i <- (abs(sce$sum-sum) > 0)
> table(j <- (abs(sce$detected-det) > 0))
29514  9561 

> identical(i, j)
[1] TRUE

session info:

R version 4.3.0 (2023-04-21)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 14.2.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Madrid
tzcode source: internal

LTLA commented 2 months ago

Seems fine to me on a few of my datasets, e.g.

sce <- BachMammaryData() # close-ish size
## class: SingleCellExperiment 
## dim: 27998 25806 
## metadata(0):
## assays(1): counts
## rownames(27998): ENSMUSG00000051951 ENSMUSG00000089699 ...
##   ENSMUSG00000096730 ENSMUSG00000095742
## rowData names(1): Symbol
## colnames: NULL
## colData names(3): Barcode Sample Condition
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):

## [1] "dgCMatrix"
## attr(,"package")
## [1] "Matrix"

sce <- addPerCellQC(sce)
sum <- colSums(assay(sce))
det <- colSums(assay(sce) > 0)

all.equal(sce$sum, sum)
## [1] TRUE
all.equal(sce$detected, det)
## [1] TRUE

(On BioC 3.19, scuttle 1.14, etc.)

HelenaLC commented 4 weeks ago

Closing as I wasn't able to reproduce this on another day ... maybe some dubious environment/namespace thingy that had me/my collaborators get different results using Bioc/Seurat, I dunno :/ -- Thanks!!