Closed sokratiag closed 2 years ago
Hi @sokratiag,
Thanks for reaching out and pointing out this discrepancy. I ran the commands myself and the results are reproducible on my end (see the code commands and outputs below).
One possibility is that I see you also loaded the MERINGUE
and MUDAN
libraries. I believe I may have borrowed the cleanCounts()
function from these, and it's possible that some other parameter values for this function may differ depending on its implementation. Try explicitly running the cleanCounts()
exported from STdeconvolve by doing: STdeconvolve::cleanCounts()
and maybe that might fix the issue?
For reference, all of the default parameters for the cleanCounts()
exported by STdeconvolve
are:
cleanCounts(counts = counts,
min.lib.size = 1,
max.lib.size = Inf,
min.reads = 1,
min.detected = 1)
but in this example min.lib.size = 100
, and min.reads = 10
.
To answer your other question:
Yes, you can run STdeconvolve
on any matrix of pixels x genes, however, I will say that the feature selection is important to make sure that STdeconvolve
is using informative genes to detect cell-types. There are two general reasons for this. The first is that if the entire transcriptome is used (assuming the mRNA capture technology is genome-wide), STdeconvolve may struggle to identify coherent cell-types, and second, with so many genes to consider, LDA model fitting will take a long time. In terms of the pixels, however, you can definitely run STdeconvolve
on all of the datasets pixels. Really poor pixels, though, with not many mRNAs captured, probably won't contain enough information to recover accurate cell-type proportions. But because cell-types are deconvolved in each pixel independent of the others, it shouldn't affect the cell-type deconvolution in the other pixels.
Let me know if this makes sense and helps at all or if you have other questions. Brendan
My code and outputs below:
library(STdeconvolve)
f <- "visiumTutorial/"
if(!file.exists(f)){
dir.create(f)
}
if(!file.exists(paste0(f, "V1_Adult_Mouse_Brain_filtered_feature_bc_matrix.tar.gz"))){
tar_gz_file <- "http://cf.10xgenomics.com/samples/spatial-exp/1.1.0/V1_Adult_Mouse_Brain/V1_Adult_Mouse_Brain_filtered_feature_bc_matrix.tar.gz"
download.file(tar_gz_file,
destfile = paste0(f, "V1_Adult_Mouse_Brain_filtered_feature_bc_matrix.tar.gz"),
method = "auto")
}
untar(tarfile = paste0(f, "V1_Adult_Mouse_Brain_filtered_feature_bc_matrix.tar.gz"),
exdir = f)
if(!file.exists(paste0(f, "V1_Adult_Mouse_Brain_spatial.tar.gz"))){
spatial_imaging_data <- "http://cf.10xgenomics.com/samples/spatial-exp/1.1.0/V1_Adult_Mouse_Brain/V1_Adult_Mouse_Brain_spatial.tar.gz"
download.file(spatial_imaging_data,
destfile = paste0(f, "V1_Adult_Mouse_Brain_spatial.tar.gz"),
method = "auto")
}
untar(tarfile = paste0(f, "V1_Adult_Mouse_Brain_spatial.tar.gz"),
exdir = f)
trying URL 'http://cf.10xgenomics.com/samples/spatial-exp/1.1.0/V1_Adult_Mouse_Brain/V1_Adult_Mouse_Brain_filtered_feature_bc_matrix.tar.gz'
Content type 'application/x-tar' length 58400030 bytes (55.7 MB)
downloaded 55.7 MB
trying URL 'http://cf.10xgenomics.com/samples/spatial-exp/1.1.0/V1_Adult_Mouse_Brain/V1_Adult_Mouse_Brain_spatial.tar.gz'
Content type 'application/x-tar' length 9039745 bytes (8.6 MB)
downloaded 8.6 MB
se <- SpatialExperiment::read10xVisium(samples = f,
type = "sparse",
data = "filtered")
se
lass: SpatialExperiment
dim: 32285 2702
metadata(0):
assays(1): counts
rownames(32285): ENSMUSG00000051951 ENSMUSG00000089699 ... ENSMUSG00000095019 ENSMUSG00000095041
rowData names(1): symbol
colnames(2702): AAACAAGTATCTCCCA-1 AAACAATCTACTAGCA-1 ... TTGTTTCCATACAACT-1 TTGTTTGTGTAAATTC-1
colData names(1): sample_id
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):
spatialData names(3) : in_tissue array_row array_col
spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
imgData names(4): sample_id image_id data scaleFactor
cd <- se@assays@data@listData$counts
dim(cd)
cd[1:5,1:5]
[1] 32285 2702
5 x 5 sparse Matrix of class "dgCMatrix"
AAACAAGTATCTCCCA-1 AAACAATCTACTAGCA-1 AAACACCAATAACTGC-1 AAACAGAGCGACTCCT-1 AAACCGGGTAGGTACC-1
ENSMUSG00000051951 . . . . .
ENSMUSG00000089699 . . . . .
ENSMUSG00000102331 . 1 . . .
ENSMUSG00000102343 . . . . .
ENSMUSG00000025900 . . . . .
pos <- SpatialExperiment::spatialCoords(se)
## change column names to x and y
## for this dataset, we will visualize barcodes using
## "pxl_col_in_fullres" = "y" coordinates,
## and "pxl_row_in_fullres" = "x" coordinates
colnames(pos) <- c("y", "x")
counts <- cleanCounts(cd, min.lib.size = 100, min.reads = 10,
verbose = TRUE)
dim(counts)
counts[1:5,1:5]
Filtering matrix with 2702 cells and 32285 genes ...
Resulting matrix has 2702 cells and 17450 genes
[1] 17450 2702
5 x 5 sparse Matrix of class "dgCMatrix"
AAACAAGTATCTCCCA-1 AAACAATCTACTAGCA-1 AAACACCAATAACTGC-1 AAACAGAGCGACTCCT-1 AAACCGGGTAGGTACC-1
ENSMUSG00000051951 . . . . .
ENSMUSG00000102331 . 1 . . .
ENSMUSG00000025902 . . . . .
ENSMUSG00000033845 1 1 . . 2
ENSMUSG00000025903 1 1 . . .
corpus <- restrictCorpus(counts,
removeAbove=1.0,
removeBelow = 0.05,
nTopOD = 1000,
verbose = TRUE)
dim(corpus)
corpus[1:5,1:5]
Removing 21 genes present in 100% or more of pixels...
17429 genes remaining...
Removing 4472 genes present in 5% or less of pixels...
12957 genes remaining...
Restricting to overdispersed genes with alpha = 0.05...
Calculating variance fit ...
Using gam with k=5...
2348 overdispersed genes ...
Using top 1000 overdispersed genes.
[1] 1000 2702
5 x 5 sparse Matrix of class "dgCMatrix"
AAACAAGTATCTCCCA-1 AAACAATCTACTAGCA-1 AAACACCAATAACTGC-1 AAACAGAGCGACTCCT-1 AAACCGGGTAGGTACC-1
ENSMUSG00000045471 . 1 1 . .
ENSMUSG00000061808 5 4 6 8 9
ENSMUSG00000035383 . . 2 1 .
ENSMUSG00000025400 . . 2 . .
ENSMUSG00000037727 . . . . .
sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-apple-darwin20.4.0 (64-bit)
Running under: macOS Monterey 12.3
Matrix products: default
LAPACK: /usr/local/Cellar/r/4.1.1/lib/R/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] TENxVisiumData_1.2.0 ExperimentHub_2.2.1 AnnotationHub_3.2.2 BiocFileCache_2.2.1
[5] dbplyr_2.1.1 STdeconvolve_0.99.9 Giotto_2.0.0.953 NMF_0.23.0
[9] cluster_2.1.2 rngtools_1.5.2 pkgmaker_0.32.2 registry_0.5-1
[13] RCTD_1.2.0 SPOTlight_0.1.7 dplyr_1.0.8 slam_0.1-50
[17] plotgardener_1.0.17 liger_2.0.1 testthat_3.1.2 SpatialExperiment_1.4.0
[21] SingleCellExperiment_1.16.0 SummarizedExperiment_1.24.0 Biobase_2.54.0 GenomicRanges_1.46.1
[25] GenomeInfoDb_1.30.1 IRanges_2.28.0 S4Vectors_0.32.3 BiocGenerics_0.40.0
[29] MatrixGenerics_1.6.0 matrixStats_0.61.0 topicmodels_0.2-12
loaded via a namespace (and not attached):
[1] plyr_1.8.6 splines_4.1.1 BiocParallel_1.28.3 ggplot2_3.3.5
[5] gridBase_0.4-7 digest_0.6.29 foreach_1.5.2 yulab.utils_0.0.4
[9] htmltools_0.5.2 magick_2.7.3 fansi_1.0.3 magrittr_2.0.2
[13] memoise_2.0.1 tm_0.7-8 doParallel_1.0.17 limma_3.50.1
[17] Biostrings_2.62.0 R.utils_2.11.0 colorspace_2.0-3 rappdirs_0.3.3
[21] blob_1.2.2 xfun_0.30 crayon_1.5.0 RCurl_1.98-1.6
[25] iterators_1.0.14 glue_1.6.2 gtable_0.3.0 zlibbioc_1.40.0
[29] XVector_0.34.0 strawr_0.0.9 DelayedArray_0.20.0 plyranges_1.14.0
[33] DropletUtils_1.14.2 Rhdf5lib_1.16.0 HDF5Array_1.22.1 scales_1.1.1.9000
[37] DBI_1.1.2 edgeR_3.36.0 Rcpp_1.0.8.3 xtable_1.8-4
[41] gridGraphics_0.5-1 dqrng_0.3.0 bit_4.0.4 htmlwidgets_1.5.4
[45] httr_1.4.2 RColorBrewer_1.1-2 modeltools_0.2-23 ellipsis_0.3.2
[49] pkgconfig_2.0.3 XML_3.99-0.9 R.methodsS3_1.8.1 scuttle_1.4.0
[53] locfit_1.5-9.5 utf8_1.2.2 AnnotationDbi_1.56.2 later_1.3.0
[57] ggplotify_0.1.0 tidyselect_1.1.2 rlang_1.0.2 reshape2_1.4.4
[61] BiocVersion_3.14.0 munsell_0.5.0 tools_4.1.1 cachem_1.0.6
[65] cli_3.2.0 generics_0.1.2 RSQLite_2.2.11 evaluate_0.15
[69] stringr_1.4.0 fastmap_1.1.0 yaml_2.3.5 knitr_1.37
[73] bit64_4.0.5 purrr_0.3.4 KEGGREST_1.34.0 nlme_3.1-155
[77] sparseMatrixStats_1.6.0 mime_0.12 R.oo_1.24.0 xml2_1.3.3
[81] brio_1.1.3 compiler_4.1.1 rstudioapi_0.13 png_0.1-7
[85] interactiveDisplayBase_1.32.0 filelock_1.0.2 curl_4.3.2 tibble_3.1.6
[89] stringi_1.7.6 desc_1.4.1 lattice_0.20-45 Matrix_1.4-1
[93] vctrs_0.3.8 pillar_1.7.0 lifecycle_1.0.1 rhdf5filters_1.6.0
[97] BiocManager_1.30.16 data.table_1.14.2 bitops_1.0-7 httpuv_1.6.5
[101] rtracklayer_1.54.0 R6_2.5.1 BiocIO_1.4.0 promises_1.2.0.1
[105] codetools_0.2-18 assertthat_0.2.1 pkgload_1.2.4 rhdf5_2.38.1
[109] rprojroot_2.0.2 rjson_0.2.21 withr_2.5.0 GenomicAlignments_1.30.0
[113] Rsamtools_2.10.0 GenomeInfoDbData_1.2.7 mgcv_1.8-39 parallel_4.1.1
[117] grid_4.1.1 beachmat_2.10.0 rmarkdown_2.13 DelayedMatrixStats_1.16.0
[121] shiny_1.7.1 NLP_0.2-1 restfulr_0.0.13
Hi bmill3r,
Thanks for the response - this indeed solved my problem!
Hi,
I am trying to reproduce the coronal mouse brain section tutorial https://jef.works/STdeconvolve/visium_10x.html.
If I use exactly the same code as the one shown in that tutorial, I get some different results mainly because of the filtering step:
library(STdeconvolve) library(MERINGUE) library(MUDAN)
f <- "visiumTutorial/"
if(!file.exists(f)){ dir.create(f) }
if(!file.exists(paste0(f, "V1_Adult_Mouse_Brain_filtered_feature_bc_matrix.tar.gz"))){ tar_gz_file <- "http://cf.10xgenomics.com/samples/spatial-exp/1.1.0/V1_Adult_Mouse_Brain/V1_Adult_Mouse_Brain_filtered_feature_bc_matrix.tar.gz" download.file(tar_gz_file, destfile = paste0(f, "V1_Adult_Mouse_Brain_filtered_feature_bc_matrix.tar.gz"), method = "auto") } untar(tarfile = paste0(f, "V1_Adult_Mouse_Brain_filtered_feature_bc_matrix.tar.gz"), exdir = f)
if(!file.exists(paste0(f, "V1_Adult_Mouse_Brain_spatial.tar.gz"))){ spatial_imaging_data <- "http://cf.10xgenomics.com/samples/spatial-exp/1.1.0/V1_Adult_Mouse_Brain/V1_Adult_Mouse_Brain_spatial.tar.gz" download.file(spatial_imaging_data, destfile = paste0(f, "V1_Adult_Mouse_Brain_spatial.tar.gz"), method = "auto") } untar(tarfile = paste0(f, "V1_Adult_Mouse_Brain_spatial.tar.gz"), exdir = f)
se <- SpatialExperiment::read10xVisium(samples = f, type = "sparse", data = "filtered") se
cd <- se@assays@data@listData$counts
pos <- SpatialExperiment::spatialCoords(se)
change column names to x and y
for this dataset, we will visualize barcodes using "pxl_col_in_fullres" = "y" coordinates, and "pxl_row_in_fullres" = "x" coordinates
colnames(pos) <- c("y", "x")
counts <- cleanCounts(cd, min.lib.size = 100, min.reads = 10)
The above line outputs: > colnames(pos) <- c("y", "x")
Further to this,
corpus <- restrictCorpus(counts, removeAbove=1.0, removeBelow = 0.05, nTopOD = 1000)
outputs:
Removing 23 genes present in 100% or more of pixels... 1573 genes remaining... Removing 0 genes present in 5% or less of pixels... 1573 genes remaining... Restricting to overdispersed genes with alpha = 0.05... Calculating variance fit ... Using gam with k=5... 20 overdispersed genes ... Using top 1000 overdispersed genes.
number of top overdispersed genes available: 20
However, in your tutorial the same lines of code produce:
Removing 21 genes present in 100% or more of pixels...
17429 genes remaining...
Removing 4472 genes present in 5% or less of pixels...
12957 genes remaining...
Restricting to overdispersed genes with alpha = 0.05...
Calculating variance fit ...
Using gam with k=5...
2348 overdispersed genes ...
Using top 1000 overdispersed genes
Can you please check this and help me understand why there is mismatching? Is it just a typo in your tutorial? If yes can you please let me know the correct filter you applied?
One more question!
Is it possible to run stDeconvolve on the whole dataset - i.e. apply no filtering at all ?
Thanks!!