JEFworks-Lab / STdeconvolve

Reference-free cell-type deconvolution of multi-cellular spatially resolved transcriptomics data
http://jef.works/STdeconvolve/
98 stars 12 forks source link

Mismatching when trying to reproduce 10x Visium Coronal Mouse Brain tutorial #11

Closed sokratiag closed 2 years ago

sokratiag commented 2 years ago

Hi,

I am trying to reproduce the coronal mouse brain section tutorial https://jef.works/STdeconvolve/visium_10x.html.

If I use exactly the same code as the one shown in that tutorial, I get some different results mainly because of the filtering step:

library(STdeconvolve) library(MERINGUE) library(MUDAN)

f <- "visiumTutorial/"

if(!file.exists(f)){ dir.create(f) }

if(!file.exists(paste0(f, "V1_Adult_Mouse_Brain_filtered_feature_bc_matrix.tar.gz"))){ tar_gz_file <- "http://cf.10xgenomics.com/samples/spatial-exp/1.1.0/V1_Adult_Mouse_Brain/V1_Adult_Mouse_Brain_filtered_feature_bc_matrix.tar.gz" download.file(tar_gz_file, destfile = paste0(f, "V1_Adult_Mouse_Brain_filtered_feature_bc_matrix.tar.gz"), method = "auto") } untar(tarfile = paste0(f, "V1_Adult_Mouse_Brain_filtered_feature_bc_matrix.tar.gz"), exdir = f)

if(!file.exists(paste0(f, "V1_Adult_Mouse_Brain_spatial.tar.gz"))){ spatial_imaging_data <- "http://cf.10xgenomics.com/samples/spatial-exp/1.1.0/V1_Adult_Mouse_Brain/V1_Adult_Mouse_Brain_spatial.tar.gz" download.file(spatial_imaging_data, destfile = paste0(f, "V1_Adult_Mouse_Brain_spatial.tar.gz"), method = "auto") } untar(tarfile = paste0(f, "V1_Adult_Mouse_Brain_spatial.tar.gz"), exdir = f)

se <- SpatialExperiment::read10xVisium(samples = f, type = "sparse", data = "filtered") se

cd <- se@assays@data@listData$counts

pos <- SpatialExperiment::spatialCoords(se)

change column names to x and y

for this dataset, we will visualize barcodes using "pxl_col_in_fullres" = "y" coordinates, and "pxl_row_in_fullres" = "x" coordinates

colnames(pos) <- c("y", "x")

counts <- cleanCounts(cd, min.lib.size = 100, min.reads = 10)

The above line outputs: > colnames(pos) <- c("y", "x")

counts <- cleanCounts(cd, min.lib.size = 100, min.reads = 10) Filtering matrix with 2702 cells and 32285 genes ... Resulting matrix has 26 cells and 1596 genes

Further to this,

corpus <- restrictCorpus(counts, removeAbove=1.0, removeBelow = 0.05, nTopOD = 1000)

outputs:

Removing 23 genes present in 100% or more of pixels... 1573 genes remaining... Removing 0 genes present in 5% or less of pixels... 1573 genes remaining... Restricting to overdispersed genes with alpha = 0.05... Calculating variance fit ... Using gam with k=5... 20 overdispersed genes ... Using top 1000 overdispersed genes.

number of top overdispersed genes available: 20

However, in your tutorial the same lines of code produce:

Removing 21 genes present in 100% or more of pixels...

17429 genes remaining...

Removing 4472 genes present in 5% or less of pixels...

12957 genes remaining...

Restricting to overdispersed genes with alpha = 0.05...

Calculating variance fit ...

Using gam with k=5...

2348 overdispersed genes ...

Using top 1000 overdispersed genes

Can you please check this and help me understand why there is mismatching? Is it just a typo in your tutorial? If yes can you please let me know the correct filter you applied?

One more question!

Is it possible to run stDeconvolve on the whole dataset - i.e. apply no filtering at all ?

Thanks!!

bmill3r commented 2 years ago

Hi @sokratiag,

Thanks for reaching out and pointing out this discrepancy. I ran the commands myself and the results are reproducible on my end (see the code commands and outputs below).

One possibility is that I see you also loaded the MERINGUE and MUDAN libraries. I believe I may have borrowed the cleanCounts() function from these, and it's possible that some other parameter values for this function may differ depending on its implementation. Try explicitly running the cleanCounts() exported from STdeconvolve by doing: STdeconvolve::cleanCounts() and maybe that might fix the issue?

For reference, all of the default parameters for the cleanCounts() exported by STdeconvolve are:

cleanCounts(counts = counts,
  min.lib.size = 1,
  max.lib.size = Inf,
  min.reads = 1,
  min.detected = 1)

but in this example min.lib.size = 100, and min.reads = 10.

To answer your other question: Yes, you can run STdeconvolve on any matrix of pixels x genes, however, I will say that the feature selection is important to make sure that STdeconvolve is using informative genes to detect cell-types. There are two general reasons for this. The first is that if the entire transcriptome is used (assuming the mRNA capture technology is genome-wide), STdeconvolve may struggle to identify coherent cell-types, and second, with so many genes to consider, LDA model fitting will take a long time. In terms of the pixels, however, you can definitely run STdeconvolve on all of the datasets pixels. Really poor pixels, though, with not many mRNAs captured, probably won't contain enough information to recover accurate cell-type proportions. But because cell-types are deconvolved in each pixel independent of the others, it shouldn't affect the cell-type deconvolution in the other pixels.

Let me know if this makes sense and helps at all or if you have other questions. Brendan

My code and outputs below:

library(STdeconvolve)
f <- "visiumTutorial/"

if(!file.exists(f)){
      dir.create(f)
  }
if(!file.exists(paste0(f, "V1_Adult_Mouse_Brain_filtered_feature_bc_matrix.tar.gz"))){
  tar_gz_file <- "http://cf.10xgenomics.com/samples/spatial-exp/1.1.0/V1_Adult_Mouse_Brain/V1_Adult_Mouse_Brain_filtered_feature_bc_matrix.tar.gz"
  download.file(tar_gz_file, 
                destfile = paste0(f, "V1_Adult_Mouse_Brain_filtered_feature_bc_matrix.tar.gz"), 
                method = "auto")
}
untar(tarfile = paste0(f, "V1_Adult_Mouse_Brain_filtered_feature_bc_matrix.tar.gz"), 
      exdir = f)

if(!file.exists(paste0(f, "V1_Adult_Mouse_Brain_spatial.tar.gz"))){
spatial_imaging_data <- "http://cf.10xgenomics.com/samples/spatial-exp/1.1.0/V1_Adult_Mouse_Brain/V1_Adult_Mouse_Brain_spatial.tar.gz"
  download.file(spatial_imaging_data, 
                destfile = paste0(f, "V1_Adult_Mouse_Brain_spatial.tar.gz"), 
                method = "auto")
}
untar(tarfile = paste0(f, "V1_Adult_Mouse_Brain_spatial.tar.gz"), 
      exdir = f)
trying URL 'http://cf.10xgenomics.com/samples/spatial-exp/1.1.0/V1_Adult_Mouse_Brain/V1_Adult_Mouse_Brain_filtered_feature_bc_matrix.tar.gz'
Content type 'application/x-tar' length 58400030 bytes (55.7 MB)
downloaded 55.7 MB

trying URL 'http://cf.10xgenomics.com/samples/spatial-exp/1.1.0/V1_Adult_Mouse_Brain/V1_Adult_Mouse_Brain_spatial.tar.gz'
Content type 'application/x-tar' length 9039745 bytes (8.6 MB)
downloaded 8.6 MB
se <- SpatialExperiment::read10xVisium(samples = f,
     type = "sparse",
     data = "filtered")
se
lass: SpatialExperiment 
dim: 32285 2702 
metadata(0):
assays(1): counts
rownames(32285): ENSMUSG00000051951 ENSMUSG00000089699 ... ENSMUSG00000095019 ENSMUSG00000095041
rowData names(1): symbol
colnames(2702): AAACAAGTATCTCCCA-1 AAACAATCTACTAGCA-1 ... TTGTTTCCATACAACT-1 TTGTTTGTGTAAATTC-1
colData names(1): sample_id
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):
spatialData names(3) : in_tissue array_row array_col
spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
imgData names(4): sample_id image_id data scaleFactor
cd <- se@assays@data@listData$counts
dim(cd)
cd[1:5,1:5]
[1] 32285  2702
5 x 5 sparse Matrix of class "dgCMatrix"
                   AAACAAGTATCTCCCA-1 AAACAATCTACTAGCA-1 AAACACCAATAACTGC-1 AAACAGAGCGACTCCT-1 AAACCGGGTAGGTACC-1
ENSMUSG00000051951                  .                  .                  .                  .                  .
ENSMUSG00000089699                  .                  .                  .                  .                  .
ENSMUSG00000102331                  .                  1                  .                  .                  .
ENSMUSG00000102343                  .                  .                  .                  .                  .
ENSMUSG00000025900                  .                  .                  .                  .                  .
pos <- SpatialExperiment::spatialCoords(se)

## change column names to x and y
## for this dataset, we will visualize barcodes using
## "pxl_col_in_fullres" = "y" coordinates,
## and "pxl_row_in_fullres" = "x" coordinates
colnames(pos) <- c("y", "x")
counts <- cleanCounts(cd, min.lib.size = 100, min.reads = 10,
                      verbose = TRUE)
dim(counts)
counts[1:5,1:5]
Filtering matrix with 2702 cells and 32285 genes ...
Resulting matrix has 2702 cells and 17450 genes
[1] 17450  2702
5 x 5 sparse Matrix of class "dgCMatrix"
                   AAACAAGTATCTCCCA-1 AAACAATCTACTAGCA-1 AAACACCAATAACTGC-1 AAACAGAGCGACTCCT-1 AAACCGGGTAGGTACC-1
ENSMUSG00000051951                  .                  .                  .                  .                  .
ENSMUSG00000102331                  .                  1                  .                  .                  .
ENSMUSG00000025902                  .                  .                  .                  .                  .
ENSMUSG00000033845                  1                  1                  .                  .                  2
ENSMUSG00000025903                  1                  1                  .                  .                  .
corpus <- restrictCorpus(counts,
                         removeAbove=1.0,
                         removeBelow = 0.05,
                         nTopOD = 1000,
                         verbose = TRUE)
dim(corpus)
corpus[1:5,1:5]
Removing 21 genes present in 100% or more of pixels...
17429 genes remaining...
Removing 4472 genes present in 5% or less of pixels...
12957 genes remaining...
Restricting to overdispersed genes with alpha = 0.05...
Calculating variance fit ...
Using gam with k=5...
2348 overdispersed genes ... 
 Using top 1000 overdispersed genes.

[1] 1000 2702
5 x 5 sparse Matrix of class "dgCMatrix"
                   AAACAAGTATCTCCCA-1 AAACAATCTACTAGCA-1 AAACACCAATAACTGC-1 AAACAGAGCGACTCCT-1 AAACCGGGTAGGTACC-1
ENSMUSG00000045471                  .                  1                  1                  .                  .
ENSMUSG00000061808                  5                  4                  6                  8                  9
ENSMUSG00000035383                  .                  .                  2                  1                  .
ENSMUSG00000025400                  .                  .                  2                  .                  .
ENSMUSG00000037727                  .                  .                  .                  .                  .
sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-apple-darwin20.4.0 (64-bit)
Running under: macOS Monterey 12.3

Matrix products: default
LAPACK: /usr/local/Cellar/r/4.1.1/lib/R/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] TENxVisiumData_1.2.0        ExperimentHub_2.2.1         AnnotationHub_3.2.2         BiocFileCache_2.2.1        
 [5] dbplyr_2.1.1                STdeconvolve_0.99.9         Giotto_2.0.0.953            NMF_0.23.0                 
 [9] cluster_2.1.2               rngtools_1.5.2              pkgmaker_0.32.2             registry_0.5-1             
[13] RCTD_1.2.0                  SPOTlight_0.1.7             dplyr_1.0.8                 slam_0.1-50                
[17] plotgardener_1.0.17         liger_2.0.1                 testthat_3.1.2              SpatialExperiment_1.4.0    
[21] SingleCellExperiment_1.16.0 SummarizedExperiment_1.24.0 Biobase_2.54.0              GenomicRanges_1.46.1       
[25] GenomeInfoDb_1.30.1         IRanges_2.28.0              S4Vectors_0.32.3            BiocGenerics_0.40.0        
[29] MatrixGenerics_1.6.0        matrixStats_0.61.0          topicmodels_0.2-12         
loaded via a namespace (and not attached):
  [1] plyr_1.8.6                    splines_4.1.1                 BiocParallel_1.28.3           ggplot2_3.3.5                
  [5] gridBase_0.4-7                digest_0.6.29                 foreach_1.5.2                 yulab.utils_0.0.4            
  [9] htmltools_0.5.2               magick_2.7.3                  fansi_1.0.3                   magrittr_2.0.2               
 [13] memoise_2.0.1                 tm_0.7-8                      doParallel_1.0.17             limma_3.50.1                 
 [17] Biostrings_2.62.0             R.utils_2.11.0                colorspace_2.0-3              rappdirs_0.3.3               
 [21] blob_1.2.2                    xfun_0.30                     crayon_1.5.0                  RCurl_1.98-1.6               
 [25] iterators_1.0.14              glue_1.6.2                    gtable_0.3.0                  zlibbioc_1.40.0              
 [29] XVector_0.34.0                strawr_0.0.9                  DelayedArray_0.20.0           plyranges_1.14.0             
 [33] DropletUtils_1.14.2           Rhdf5lib_1.16.0               HDF5Array_1.22.1              scales_1.1.1.9000            
 [37] DBI_1.1.2                     edgeR_3.36.0                  Rcpp_1.0.8.3                  xtable_1.8-4                 
 [41] gridGraphics_0.5-1            dqrng_0.3.0                   bit_4.0.4                     htmlwidgets_1.5.4            
 [45] httr_1.4.2                    RColorBrewer_1.1-2            modeltools_0.2-23             ellipsis_0.3.2               
 [49] pkgconfig_2.0.3               XML_3.99-0.9                  R.methodsS3_1.8.1             scuttle_1.4.0                
 [53] locfit_1.5-9.5                utf8_1.2.2                    AnnotationDbi_1.56.2          later_1.3.0                  
 [57] ggplotify_0.1.0               tidyselect_1.1.2              rlang_1.0.2                   reshape2_1.4.4               
 [61] BiocVersion_3.14.0            munsell_0.5.0                 tools_4.1.1                   cachem_1.0.6                 
 [65] cli_3.2.0                     generics_0.1.2                RSQLite_2.2.11                evaluate_0.15                
 [69] stringr_1.4.0                 fastmap_1.1.0                 yaml_2.3.5                    knitr_1.37                   
 [73] bit64_4.0.5                   purrr_0.3.4                   KEGGREST_1.34.0               nlme_3.1-155                 
 [77] sparseMatrixStats_1.6.0       mime_0.12                     R.oo_1.24.0                   xml2_1.3.3                   
 [81] brio_1.1.3                    compiler_4.1.1                rstudioapi_0.13               png_0.1-7                    
 [85] interactiveDisplayBase_1.32.0 filelock_1.0.2                curl_4.3.2                    tibble_3.1.6                 
 [89] stringi_1.7.6                 desc_1.4.1                    lattice_0.20-45               Matrix_1.4-1                 
 [93] vctrs_0.3.8                   pillar_1.7.0                  lifecycle_1.0.1               rhdf5filters_1.6.0           
 [97] BiocManager_1.30.16           data.table_1.14.2             bitops_1.0-7                  httpuv_1.6.5                 
[101] rtracklayer_1.54.0            R6_2.5.1                      BiocIO_1.4.0                  promises_1.2.0.1             
[105] codetools_0.2-18              assertthat_0.2.1              pkgload_1.2.4                 rhdf5_2.38.1                 
[109] rprojroot_2.0.2               rjson_0.2.21                  withr_2.5.0                   GenomicAlignments_1.30.0     
[113] Rsamtools_2.10.0              GenomeInfoDbData_1.2.7        mgcv_1.8-39                   parallel_4.1.1               
[117] grid_4.1.1                    beachmat_2.10.0               rmarkdown_2.13                DelayedMatrixStats_1.16.0    
[121] shiny_1.7.1                   NLP_0.2-1                     restfulr_0.0.13              
sokratiag commented 2 years ago

Hi bmill3r,

Thanks for the response - this indeed solved my problem!