LieberInstitute / spatialLIBD

Code for the spatialLIBD R/Bioconductor package and shiny app
http://LieberInstitute.github.io/spatialLIBD/
80 stars 16 forks source link

Duplicated colnames in count matrix #18

Closed patrickCNMartin closed 2 years ago

patrickCNMartin commented 2 years ago

Hi,

I have been trying to access the raw data provided in the spatialLIBD package. So far, I have used:

# as specified in the vignette 
sce <- fetch_data(type = 'sce')

counts <- counts(sce)
barcodes <- as.data.frame(sce@colData[,c("barcode","sample_name","imagerow","imagecol")])

I wish to extract all counts associated with one sample (as defined by sample name). However, there seems to be a mismatch between counts colnames and barcodes.

dim(counts) # dim = 33538 47681
length(colnames(counts)) != length(unique(colnames(counts)))

Because of this, it seems that the following does not work:

# as an example, I will use sample name 151507
barcodes <- barcodes[which(barcodes$sample_name == 151507),c("imagecol","imagerow")]

counts <- counts[, colnames(counts) %in% rownames(barcodes)]

I end up having many duplicated barcodes and not knowing which barcode belongs to which sample.

My question is how to access raw count values for each sample?

Thank you for your help.

lcolladotor commented 2 years ago

Hi,

You are using the @ accessor, which is strongly discouraged in Bioconductor packages. I'm aware that Seurat does use @ in their documentation here and there (or at least used to).

This particular question you have is more appropriate for SpatialExperiment, the infraestructure package for storing the data we use in spatialLIBD. In particular, you are running into a situation where data.frame() and matrix() have different checks. A data.frame() does not allow repeating the rownames where a matrix() does.

> m <- matrix(1:4, nrow = 2)
> rownames(m) <- rep("a", 2)
> 
> d <- as.data.frame(m)
> rownames(d) <- rep("a", 2)
Error in `.rowNamesDF<-`(x, value = value) : 
  duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique value when setting 'row.names': ‘a’ 

Having said that, you can use the accessor assay() function from SpatialExperiment (which is inherited from SingleCellExperiment, etc).

Below I used the example from SpatialExperiment::read10xVisium() to create an example spe object.

> library("SpatialExperiment")
> ?read10xVisium
> dir <- system.file(
+   file.path("extdata", "10xVisium"), 
+   package = "SpatialExperiment")
>   
> sample_ids <- c("section1", "section2")
> samples <- file.path(dir, sample_ids)
>   
> list.files(samples[1])
[1] "raw_feature_bc_matrix" "spatial"              
> list.files(file.path(samples[1], "spatial"))
[1] "scalefactors_json.json"    "tissue_lowres_image.png"   "tissue_positions_list.csv"
> file.path(samples[1], "raw_feature_bc_matrix")
[1] "C:/R/R-4.1.2bioc3.14/library/SpatialExperiment/extdata/10xVisium/section1/raw_feature_bc_matrix"
> 
> (spe <- read10xVisium(samples, sample_ids, 
+   type = "sparse", data = "raw", 
+   images = "lowres", load = FALSE))
class: SpatialExperiment 
dim: 50 99 
metadata(0):
assays(1): counts
rownames(50): ENSMUSG00000051951 ENSMUSG00000089699 ... ENSMUSG00000005886
  ENSMUSG00000101476
rowData names(1): symbol
colnames(99): AAACAACGAATAGTTC-1 AAACAAGTATCTCCCA-1 ... AAAGTCGACCCTCAGT-1
  AAAGTGCCATCAATTA-1
colData names(1): sample_id
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):
spatialData names(3) : in_tissue array_row array_col
spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
imgData names(4): sample_id image_id data scaleFactor

Then I accessed the counts for the first 10 genes and the first 3 spots using the assay() accessor. Note that counts(x) is equal to assay(x, "counts"), since it's such a common scenario.

> assay(spe, "counts")[1:10, 1:3]
10 x 3 sparse Matrix of class "dgCMatrix"
                   AAACAACGAATAGTTC-1 AAACAAGTATCTCCCA-1 AAACAATCTACTAGCA-1
ENSMUSG00000051951                  .                  .                  .
ENSMUSG00000089699                  .                  .                  .
ENSMUSG00000102343                  .                  .                  .
ENSMUSG00000025900                  .                  .                  .
ENSMUSG00000025902                  .                  .                  .
ENSMUSG00000104328                  .                  .                  .
ENSMUSG00000033845                  .                  1                  1
ENSMUSG00000025903                  .                  1                  1
ENSMUSG00000104217                  .                  .                  .
ENSMUSG00000033813                  .                  2                  1

Best, Leo