LTLA / scuttle

Clone of the Bioconductor repository for the scuttle package.
https://bioconductor.org/packages/devel/bioc/html/scuttle.html
9 stars 7 forks source link

discrepancy between addPerCellQC's sum & colSums #28

Closed HelenaLC closed 4 weeks ago

HelenaLC commented 2 months ago

Hi there - apologies in advance if this is very obvious - my collaborators processed our data using Seurat and noticed a discrepancy between their Count_RNA/nFeature_RNA and by sum/detected (meanwhile, % mitochondrial etc. were identical). Any clue as to what might be going on?

> sce
class: SingleCellExperiment 
dim: 15502 39075 
metadata(0):
assays(2): counts logcounts
rownames(15502): SAMD11 NOC2L ... MT-ND6 MT-CYB
rowData names(3): gene_id gene_symbol hv
colnames(39075): c_1_1_1 c_1_1_2 ... c_2_2_10415 c_2_2_10416
colData names(35): library barcode ... detected total
reducedDimNames(2): PCA UMAP
mainExpName: NULL
altExpNames(0):

> class(assay(sce))
[1] "dgCMatrix"
attr(,"package")
[1] "Matrix"

> sce <- addPerCellQC(sce)
> sum <- colSums(assay(sce))
> det <- colSums(assay(sce) > 0)

> identical(sce$sum, sum)
[1] FALSE
> identical(sce$detected, det)
[1] FALSE

> head(cbind(scuttle=sce$sum, colSums=sum), 10)
         scuttle colSums
c_1_1_1     5755    5755
c_1_1_2      800     800
c_1_1_3     2953    2953
c_1_1_4      668     668
c_1_1_5     8639    8637
c_1_1_6      952     951
c_1_1_7     6460    6453
c_1_1_8      527     527
c_1_1_9      841     840
c_1_1_10    1000    1000

> i <- (abs(sce$sum-sum) > 0)
> table(j <- (abs(sce$detected-det) > 0))
FALSE  TRUE 
29514  9561 

> identical(i, j)
[1] TRUE

session info:

R version 4.3.0 (2023-04-21)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 14.2.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Madrid
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] igraph_2.0.3                DropletUtils_1.22.0         ape_5.7-1                  
 [4] BayesSpace_1.12.0           nnSVG_1.6.4                 ggspavis_1.8.0             
 [7] STexampleData_1.10.1        SpatialExperiment_1.12.0    ExperimentHub_2.10.0       
[10] AnnotationHub_3.10.0        BiocFileCache_2.10.1        dbplyr_2.5.0               
[13] RColorBrewer_1.1-3          ggrastr_1.0.2               scater_1.30.1              
[16] RSpectra_0.16-1             patchwork_1.2.0             scran_1.30.2               
[19] InSituType_1.0.0            ggplot2_3.5.0               BiocParallel_1.36.0        
[22] tidyr_1.3.1                 dplyr_1.1.4                 scuttle_1.12.0             
[25] HDF5Array_1.30.1            rhdf5_2.46.1                DelayedArray_0.28.0        
[28] SparseArray_1.2.4           S4Arrays_1.2.1              abind_1.4-5                
[31] Matrix_1.6-5                SingleCellExperiment_1.24.0 SummarizedExperiment_1.32.0
[34] Biobase_2.62.0              GenomicRanges_1.54.1        GenomeInfoDb_1.38.8        
[37] IRanges_2.36.0              S4Vectors_0.40.2            BiocGenerics_0.48.1        
[40] MatrixGenerics_1.15.0       matrixStats_1.2.0          

loaded via a namespace (and not attached):
  [1] bitops_1.0-7                  httr_1.4.7                    doParallel_1.0.17            
  [4] tools_4.3.0                   backports_1.4.1               utf8_1.2.4                   
  [7] R6_2.5.1                      DirichletReg_0.7-1            mgcv_1.9-1                   
 [10] uwot_0.1.16                   rhdf5filters_1.14.1           GetoptLong_1.0.5             
 [13] withr_3.0.0                   gridExtra_2.3                 cli_3.6.2                    
 [16] Cairo_1.6-2                   sandwich_3.1-0                labeling_0.4.3               
 [19] nnls_1.5                      mvtnorm_1.2-4                 pbapply_1.7-2                
 [22] ggridges_0.5.6                askpass_1.2.0                 R.utils_2.12.3               
 [25] colorRamps_2.3.4              plotrix_3.8-4                 limma_3.58.1                 
 [28] flowCore_2.14.2               rstudioapi_0.15.0             RSQLite_2.3.5                
 [31] generics_0.1.3                shape_1.4.6.1                 gtools_3.9.5                 
 [34] car_3.1-2                     RProtoBufLib_2.14.1           ggbeeswarm_0.7.2             
 [37] fansi_1.0.6                   R.methodsS3_1.8.2             lifecycle_1.0.4              
 [40] multcomp_1.4-25               yaml_2.3.8                    edgeR_4.0.16                 
 [43] carData_3.0-5                 Rtsne_0.17                    grid_4.3.0                   
 [46] blob_1.2.4                    promises_1.2.1                dqrng_0.3.2                  
 [49] crayon_1.5.2                  lattice_0.22-6                beachmat_2.18.1              
 [52] cowplot_1.1.3                 KEGGREST_1.42.0               magick_2.8.3                 
 [55] pillar_1.9.0                  knitr_1.45                    ComplexHeatmap_2.18.0        
 [58] metapod_1.10.1                rjson_0.2.21                  xgboost_1.7.7.1              
 [61] codetools_0.2-19              glue_1.7.0                    data.table_1.15.2            
 [64] vctrs_0.6.5                   png_0.1-8                     gtable_0.3.4                 
 [67] assertthat_0.2.1              cachem_1.0.8                  xfun_0.42                    
 [70] mime_0.12                     ggside_0.3.1                  ConsensusClusterPlus_1.66.0  
 [73] coda_0.19-4.1                 survival_3.5-8                iterators_1.0.14             
 [76] cytolib_2.14.1                maxLik_1.5-2.1                statmod_1.5.0                
 [79] bluster_1.12.0                interactiveDisplayBase_1.40.0 ellipsis_0.3.2               
 [82] TH.data_1.1-2                 nlme_3.1-164                  lsa_0.73.3                   
 [85] bit64_4.0.5                   filelock_1.0.3                RcppAnnoy_0.0.22             
 [88] SnowballC_0.7.1               irlba_2.3.5.1                 vipor_0.4.7                  
 [91] colorspace_2.1-0              DBI_1.2.2                     tidyselect_1.2.1             
 [94] BRISC_1.0.5                   bit_4.0.5                     compiler_4.3.0               
 [97] curl_5.2.1                    BiocNeighbors_1.20.2          scales_1.3.0                 
[100] rappdirs_0.3.3                stringr_1.5.1                 digest_0.6.35                
[103] rmarkdown_2.26                XVector_0.42.0                CATALYST_1.26.0              
[106] htmltools_0.5.7               pkgconfig_2.0.3               umap_0.2.10.0                
[109] sparseMatrixStats_1.14.0      fastmap_1.1.1                 rlang_1.1.3                  
[112] GlobalOptions_0.1.2           shiny_1.8.0                   DelayedMatrixStats_1.24.0    
[115] farver_2.1.1                  zoo_1.8-12                    jsonlite_1.8.8               
[118] mclust_6.1                    R.oo_1.26.0                   BiocSingular_1.18.0          
[121] RCurl_1.98-1.14               magrittr_2.0.3                Formula_1.2-5                
[124] GenomeInfoDbData_1.2.11       Rhdf5lib_1.24.2               munsell_0.5.0                
[127] Rcpp_1.0.12                   ggnewscale_0.4.10             viridis_0.6.5                
[130] reticulate_1.35.0             stringi_1.8.3                 zlibbioc_1.48.2              
[133] MASS_7.3-60.0.1               plyr_1.8.9                    parallel_4.3.0               
[136] ggrepel_0.9.5                 Biostrings_2.70.3             splines_4.3.0                
[139] circlize_0.4.16               rdist_0.0.5                   locfit_1.5-9.9               
[142] ggpubr_0.6.0                  ggsignif_0.6.4                reshape2_1.4.4               
[145] ScaledMatrix_1.10.0           pkgload_1.3.4                 BiocVersion_3.18.1           
[148] XML_3.99-0.16.1               drc_3.0-1                     evaluate_0.23                
[151] BiocManager_1.30.22           foreach_1.5.2                 tweenr_2.0.3                 
[154] httpuv_1.6.14                 miscTools_0.6-28              RANN_2.6.1                   
[157] openssl_2.1.1                 purrr_1.0.2                   polyclip_1.10-6              
[160] clue_0.3-65                   ggforce_0.4.2                 rsvd_1.0.5                   
[163] broom_1.0.5                   xtable_1.8-4                  rstatix_0.7.2                
[166] later_1.3.2                   viridisLite_0.4.2             tibble_3.2.1                 
[169] memoise_2.0.1                 FlowSOM_2.10.0                beeswarm_0.4.0               
[172] AnnotationDbi_1.64.1          cluster_2.1.6    
LTLA commented 2 months ago

Seems fine to me on a few of my datasets, e.g.

library(scRNAseq)
sce <- BachMammaryData() # close-ish size
sce
## class: SingleCellExperiment 
## dim: 27998 25806 
## metadata(0):
## assays(1): counts
## rownames(27998): ENSMUSG00000051951 ENSMUSG00000089699 ...
##   ENSMUSG00000096730 ENSMUSG00000095742
## rowData names(1): Symbol
## colnames: NULL
## colData names(3): Barcode Sample Condition
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):

class(assay(sce))
## [1] "dgCMatrix"
## attr(,"package")
## [1] "Matrix"

library(scuttle)
sce <- addPerCellQC(sce)
sum <- colSums(assay(sce))
det <- colSums(assay(sce) > 0)

all.equal(sce$sum, sum)
## [1] TRUE
all.equal(sce$detected, det)
## [1] TRUE

(On BioC 3.19, scuttle 1.14, etc.)

HelenaLC commented 4 weeks ago

Closing as I wasn't able to reproduce this on another day ... maybe some dubious environment/namespace thingy that had me/my collaborators get different results using Bioc/Seurat, I dunno :/ -- Thanks!!