drighelli / SpatialExperiment

55 stars 20 forks source link

Store spatialData as DataFrame #40

Closed lmweber closed 3 years ago

lmweber commented 3 years ago

Should we store spatialData as a DataFrame instead of data.frame?

Then this would be consistent with rowData and colData.

drighelli commented 3 years ago

it should be a DataFrame already!

maybe you're using the spatialData getter with as_df=TRUE

lmweber commented 3 years ago

I think it depends how the user provides it. If they are using read10xVisium() then it will all be correct. But if they are using the basic SpatialExperiment() constructor, and they provide spatialData as a data.frame, then it will stay this way in the object.

E.g. see the example below, which is a shortened version of the mouse coronal script from STexampleData.

In this example, if you check spatialData(spe) at the end, it is a data.frame.

One solution would be for the user to simply provide it as a DataFrame, but I think it would also be good to have an internal check and conversion to DataFrame if the user provides a data.frame.

library(SpatialExperiment)
library(Matrix)
library(rjson)

# -------------
# Download data
# -------------

dir.create("tmp")

url <- "https://cf.10xgenomics.com/samples/spatial-exp/1.1.0/V1_Adult_Mouse_Brain/V1_Adult_Mouse_Brain_raw_feature_bc_matrix.tar.gz"
fn <- basename(url)
download.file(url, file.path("tmp", fn))
system(paste0("tar -C tmp -xvzf ", file.path("tmp", fn)))

url <- "https://cf.10xgenomics.com/samples/spatial-exp/1.1.0/V1_Adult_Mouse_Brain/V1_Adult_Mouse_Brain_spatial.tar.gz"
fn <- basename(url)
download.file(url, file.path("tmp", fn))
system(paste0("tar -C tmp -xvzf ", file.path("tmp", fn)))

# ---------
# Load data
# ---------

file_barcodes <- file.path("tmp", "raw_feature_bc_matrix", "barcodes.tsv.gz")
df_barcodes <- read.csv(file_barcodes, sep = "\t", header = FALSE, 
                        col.names = c("barcode_id"))

file_features <- file.path("tmp", "raw_feature_bc_matrix", "features.tsv.gz")
df_features <- read.csv(file_features, sep = "\t", header = FALSE, 
                        col.names = c("gene_id", "gene_name", "feature_type"))

file_counts <- file.path("tmp", "raw_feature_bc_matrix", "matrix.mtx.gz")
counts <- readMM(file = file_counts)

file_tisspos <- file.path("tmp", "spatial", "tissue_positions_list.csv")
df_tisspos <- read.csv(file_tisspos, header = FALSE, 
                       col.names=c("barcode_id", "in_tissue", "array_row", "array_col", 
                                   "pxl_col_in_fullres", "pxl_row_in_fullres"))

# --------------
# Match barcodes
# --------------

ord <- match(df_barcodes$barcode_id, df_tisspos$barcode_id)
df_tisspos_ord <- df_tisspos[ord, ]
rownames(df_tisspos_ord) <- NULL
stopifnot(all(df_barcodes$barcode_id == df_tisspos_ord$barcode_id))

# ------------------------
# Create SpatialExperiment
# ------------------------

row_data <- df_features
rownames(row_data) <- df_features$gene_id

col_data <- df_barcodes
col_data$sample_id <- "sample_01"
rownames(col_data) <- df_barcodes$barcode_id

spatial_data <- df_tisspos_ord[, c("barcode_id", "in_tissue")]
spatial_data$x <- df_tisspos_ord$pxl_row_in_fullres
y_coord_tmp <- df_tisspos_ord$pxl_col_in_fullres
y_coord_tmp <- (-1 * y_coord_tmp) + min(y_coord_tmp) + max(y_coord_tmp)
spatial_data$y <- y_coord_tmp
rownames(spatial_data) <- df_tisspos_ord$barcode_id

col_data_additional <- df_tisspos_ord[, c("array_row", "array_col", "pxl_col_in_fullres", "pxl_row_in_fullres")]
rownames(col_data_additional) <- df_tisspos_ord$barcode_id
col_data <- cbind(col_data, col_data_additional)

spe <- SpatialExperiment(
  assays = list(counts = counts), 
  rowData = row_data, 
  colData = col_data, 
  spatialData = spatial_data
)
drighelli commented 3 years ago

it's already there https://github.com/drighelli/SpatialExperiment/blob/master/R/SpatialExperiment.R#L159 https://github.com/drighelli/SpatialExperiment/blob/master/R/SpatialExperiment-methods.R#L176

So, it's weird...

lmweber commented 3 years ago

Hmm, strange. Will look into it some more.

lmweber commented 3 years ago

Looks like:

I'm not sure yet how / why this is the case.

drighelli commented 3 years ago

See my first comment... Anyway, there is an error in case the as_df is false it returns a matrix

lmweber commented 3 years ago

Ah, I see. Yes, you are right, the getter has as_df = TRUE as default. I thought the default was FALSE.

So it was the getter, not the internal structure.

Should we change the getter to as_DF = FALSE by default? Otherwise people will always get a huge multi-page printout when they use the getter with default settings, since the getter converts the DataFrame to a data.frame.

lmweber commented 3 years ago

Or we could change the line https://github.com/drighelli/SpatialExperiment/blob/master/R/SpatialExperiment-methods.R#L162

to return(DataFrame(spd)) instead of return(as.data.frame(spd))

lmweber commented 3 years ago

E.g. see this example - using spatialData(spe) to display the object prints thousands of lines to screen, since it is a data.frame instead of DataFrame (edit: note I have changed this example to include head(), so it is smaller now).

So I think changing it to return(DataFrame(spd)) in the getter would solve this.

lmweber commented 3 years ago

Addressed in pull request #41 above