JEFworks-Lab / STdeconvolve

Reference-free cell-type deconvolution of multi-cellular spatially resolved transcriptomics data
http://jef.works/STdeconvolve/
98 stars 12 forks source link

From STdeconvolve/docs/visium_10x.md. Error when call function CleanCounts() #33

Closed sondvo closed 1 year ago

sondvo commented 1 year ago

This is my SpatialExperiment object:

spatial_data <- read10xVisium(
    samples='data/sample',                  
    sample_id='10X_Mouse_Brain',  
    type='HDF5',
    data = 'filtered',
    images='hires',
    load=FALSE
)
spatial_data

class: SpatialExperiment dim: 32285 2702 metadata(0): assays(1): counts rownames(32285): ENSMUSG00000051951 ENSMUSG00000089699 ... ENSMUSG00000095019 ENSMUSG00000095041 rowData names(1): symbol colnames(2702): AAACAAGTATCTCCCA-1 AAACAATCTACTAGCA-1 ... TTGTTTCCATACAACT-1 TTGTTTGTGTAAATTC-1 colData names(4): in_tissue array_row array_col sample_id reducedDimNames(0): mainExpName: NULL altExpNames(0): spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres imgData names(4): sample_id image_id data scaleFactor

Using CleanCounts():

x <- spatial_data@assays@data@listData$counts
counts <- cleanCounts(x, min.lib.size = 100, min.reads = 10)

Error in base::rowSums(x, na.rm = na.rm, dims = dims, ...): 'x' must be an array of at least two dimensions Traceback:

  1. cleanCounts(x, min.lib.size = 100, min.reads = 10)
  2. Matrix::rowSums(counts)
  3. Matrix::rowSums(counts)
  4. base::rowSums(x, na.rm = na.rm, dims = dims, ...)
  5. stop("'x' must be an array of at least two dimensions")

I look at source code of cleanCounts() function, and found that the problem is due to the line counts <- Matrix::Matrix(counts, sparse = TRUE) which will convert count matrix of SpatialExperiment into a sparse matrix with shape (87234070 x 1). And when apply filter metrics to that matrix, the result will be 0 cells and 0 genes.

sondvo commented 1 year ago

I solved the problem. Read SpatialExperiment from .h5 file will store matrix under Formal class 'TENxMatrix' [package "HDF5Array"] and somehow this format does not work with cleanCounts() function.

spatial_data2 <- read10xVisium(
    samples='data/sample',                   # Path to sample dir
    sample_id='10X_Mouse_Brain',  # Name the sample. When you combine more than 1 sample, this will be used to identified batches in imgData, colData,...
    type='HDF5',
    data = 'filtered',
    images='hires',
    load=FALSE
)
str(spatial_data2@assays@data@listData$counts)

Formal class 'TENxMatrix' [package "HDF5Array"] with 1 slot ..@ seed:Formal class 'TENxMatrixSeed' [package "HDF5Array"] with 6 slots .. .. ..@ filepath : chr "/home/ub-sonvo-25d094476064960/spatial/STdeconvolve/data/sample/outs/filtered_feature_bc_matrix.h5" .. .. ..@ group : chr "/matrix" .. .. ..@ subdata : NULL .. .. ..@ dim : int [1:2] 32285 2702 .. .. ..@ indptr_ranges:'data.frame': 2702 obs. of 2 variables: .. .. .. ..$ start: num [1:2702] 1 5231 8877 15149 19655 ... .. .. .. ..$ width: int [1:2702] 5230 3646 6272 4506 5971 6097 8545 7210 6943 6315 ... .. .. ..@ dimnames :List of 2 .. .. .. ..$ : chr [1:32285] "ENSMUSG00000051951" "ENSMUSG00000089699" "ENSMUSG00000102331" "ENSMUSG00000102343" ... .. .. .. ..$ : chr [1:2702] "AAACAAGTATCTCCCA-1" "AAACAATCTACTAGCA-1" "AAACACCAATAACTGC-1" "AAACAGAGCGACTCCT-1" ...

Read SpatialExperiment from sparse folder (barcodes, features, matrix) works fine. You need to warn users about this in instruction. Thank you.

spatial_data <- read10xVisium(
    samples='data/sample',                   # Path to sample dir
    sample_id='10X_Mouse_Brain',  # Name the sample. When you combine more than 1 sample, this will be used to identified batches in imgData, colData,...
    type='sparse',
    data = 'filtered',
    images='hires',
    load=FALSE
)
str(spatial_data@assays@data@listData$counts)

Formal class 'dgCMatrix' [package "Matrix"] with 6 slots ..@ i : int [1:16031101] 8 9 10 11 13 16 20 24 25 34 ... ..@ p : int [1:2703] 0 5230 8876 15148 19654 25625 31722 40267 47477 54420 ... ..@ Dim : int [1:2] 32285 2702 ..@ Dimnames:List of 2 .. ..$ : chr [1:32285] "ENSMUSG00000051951" "ENSMUSG00000089699" "ENSMUSG00000102331" "ENSMUSG00000102343" ... .. ..$ : chr [1:2702] "AAACAAGTATCTCCCA-1" "AAACAATCTACTAGCA-1" "AAACACCAATAACTGC-1" "AAACAGAGCGACTCCT-1" ... ..@ x : num [1:16031101] 1 1 2 1 7 1 3 1 2 1 ... ..@ factors : list()

bmill3r commented 1 year ago

Hi @duyson1999,

Thanks so much for pointing this out. It looks like when applying the read10xVisium() function, specifying the parameter type=HDF5 will read the SpatialExperiment from the .h5 file, whereas type=sparse will read the sparse matrices. Looking at the details of the read10xVisium() documentation this is confirmed: The '.h5' files are used if 'type = "HDF5"'. We do provide an example Analysis of 10X Visium data where we read in a SpatialExperiment object using the read10xVisium() function where we specify type=sparse. I will add a line to warn users about specifically using this parameter.

Thanks again for your help and patience! Brendan