constantAmateur / SoupX

R package to quantify and remove cell free mRNAs from droplet based scRNA-seq data
255 stars 34 forks source link

Cell level Rho values are the same in the whole dataset #61

Closed yunguan-wang closed 3 years ago

yunguan-wang commented 3 years ago

I followed the vignette and processed the sample dataset. At the end, I found all the cells have the same rho, although I believe the rho of each cell are independently estimated. Am I missing something here?

SessionInfo is pasted here.

R version 4.0.2 (2020-06-22) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Red Hat Enterprise Linux Server 7.6 (Maipo)

Matrix products: default BLAS/LAPACK: /cm/shared/apps/intel/compilers_and_libraries/2017.6.256/linux/mkl/lib/intel64_lin/libmkl_gf_lp64.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] argparse_2.0.1 SoupX_1.4.5 Seurat_3.2.1 BiocParallel_1.22.0
[5] ggplot2_3.3.2 DropletUtils_1.8.0 SingleCellExperiment_1.10.1 SummarizedExperiment_1.18.2 [9] DelayedArray_0.14.1 matrixStats_0.56.0 Biobase_2.48.0 GenomicRanges_1.40.0
[13] GenomeInfoDb_1.24.2 IRanges_2.22.2 S4Vectors_0.26.1 BiocGenerics_0.34.0

loaded via a namespace (and not attached): [1] Rtsne_0.15 colorspace_1.4-1 deldir_0.1-29 ellipsis_0.3.1 ggridges_0.5.2
[6] XVector_0.28.0 spatstat.data_1.4-3 rstudioapi_0.11 leiden_0.3.3 listenv_0.8.0
[11] ggrepel_0.8.2 codetools_0.2-16 splines_4.0.2 R.methodsS3_1.8.1 polyclip_1.10-0
[16] jsonlite_1.7.1 ica_1.0-2 cluster_2.1.0 png_0.1-7 R.oo_1.24.0
[21] uwot_0.1.8 shiny_1.5.0 HDF5Array_1.16.1 sctransform_0.3 BiocManager_1.30.10
[26] compiler_4.0.2 httr_1.4.2 dqrng_0.2.1 Matrix_1.2-18 fastmap_1.0.1
[31] lazyeval_0.2.2 limma_3.44.3 later_1.1.0.1 htmltools_0.5.0 tools_4.0.2
[36] rsvd_1.0.3 igraph_1.2.5 gtable_0.3.0 glue_1.4.2 GenomeInfoDbData_1.2.3 [41] RANN_2.6.1 reshape2_1.4.4 dplyr_1.0.2 rappdirs_0.3.1 spatstat_1.64-1
[46] Rcpp_1.0.5 vctrs_0.3.4 nlme_3.1-149 lmtest_0.9-38 stringr_1.4.0
[51] globals_0.13.0 mime_0.9 miniUI_0.1.1.1 lifecycle_0.2.0 irlba_2.3.3
[56] goftest_1.2-2 future_1.19.1 edgeR_3.30.3 zlibbioc_1.34.0 MASS_7.3-53
[61] zoo_1.8-8 scales_1.1.1 spatstat.utils_1.17-0 promises_1.1.1 rhdf5_2.32.2
[66] RColorBrewer_1.1-2 gridExtra_2.3 reticulate_1.16 pbapply_1.4-3 rpart_4.1-15
[71] stringi_1.5.3 rlang_0.4.7 pkgconfig_2.0.3 bitops_1.0-6 lattice_0.20-41
[76] tensor_1.5 ROCR_1.0-11 purrr_0.3.4 Rhdf5lib_1.10.1 patchwork_1.0.1
[81] htmlwidgets_1.5.1 cowplot_1.1.0 tidyselect_1.1.0 RcppAnnoy_0.0.16 plyr_1.8.6
[86] magrittr_1.5 R6_2.4.1 generics_0.0.2 mgcv_1.8-33 pillar_1.4.6
[91] findpython_1.0.5 withr_2.3.0 fitdistrplus_1.1-1 abind_1.4-5 survival_3.1-12
[96] RCurl_1.98-1.2 tibble_3.0.3 future.apply_1.6.0 crayon_1.3.4 KernSmooth_2.23-17
[101] plotly_4.9.2.1 locfit_1.5-9.4 grid_4.0.2 data.table_1.13.0 digest_0.6.25
[106] xtable_1.8-4 tidyr_1.1.2 httpuv_1.5.4 R.utils_2.10.1 munsell_0.5.0
[111] viridisLite_0.3.0

constantAmateur commented 3 years ago

The correct and intended behaviour is that rho is calculated globally, see the paper. If you need a per-cell estimate of the contamination, I suggest calculating an "effective rho" by comparing the number of UMIs in each cell in the post-correction matrix to the raw one.