Closed lmweber closed 3 years ago
it should be a DataFrame
already!
maybe you're using the spatialData
getter with as_df=TRUE
I think it depends how the user provides it. If they are using read10xVisium()
then it will all be correct. But if they are using the basic SpatialExperiment()
constructor, and they provide spatialData
as a data.frame
, then it will stay this way in the object.
E.g. see the example below, which is a shortened version of the mouse coronal script from STexampleData.
In this example, if you check spatialData(spe)
at the end, it is a data.frame
.
One solution would be for the user to simply provide it as a DataFrame
, but I think it would also be good to have an internal check and conversion to DataFrame
if the user provides a data.frame
.
library(SpatialExperiment)
library(Matrix)
library(rjson)
# -------------
# Download data
# -------------
dir.create("tmp")
url <- "https://cf.10xgenomics.com/samples/spatial-exp/1.1.0/V1_Adult_Mouse_Brain/V1_Adult_Mouse_Brain_raw_feature_bc_matrix.tar.gz"
fn <- basename(url)
download.file(url, file.path("tmp", fn))
system(paste0("tar -C tmp -xvzf ", file.path("tmp", fn)))
url <- "https://cf.10xgenomics.com/samples/spatial-exp/1.1.0/V1_Adult_Mouse_Brain/V1_Adult_Mouse_Brain_spatial.tar.gz"
fn <- basename(url)
download.file(url, file.path("tmp", fn))
system(paste0("tar -C tmp -xvzf ", file.path("tmp", fn)))
# ---------
# Load data
# ---------
file_barcodes <- file.path("tmp", "raw_feature_bc_matrix", "barcodes.tsv.gz")
df_barcodes <- read.csv(file_barcodes, sep = "\t", header = FALSE,
col.names = c("barcode_id"))
file_features <- file.path("tmp", "raw_feature_bc_matrix", "features.tsv.gz")
df_features <- read.csv(file_features, sep = "\t", header = FALSE,
col.names = c("gene_id", "gene_name", "feature_type"))
file_counts <- file.path("tmp", "raw_feature_bc_matrix", "matrix.mtx.gz")
counts <- readMM(file = file_counts)
file_tisspos <- file.path("tmp", "spatial", "tissue_positions_list.csv")
df_tisspos <- read.csv(file_tisspos, header = FALSE,
col.names=c("barcode_id", "in_tissue", "array_row", "array_col",
"pxl_col_in_fullres", "pxl_row_in_fullres"))
# --------------
# Match barcodes
# --------------
ord <- match(df_barcodes$barcode_id, df_tisspos$barcode_id)
df_tisspos_ord <- df_tisspos[ord, ]
rownames(df_tisspos_ord) <- NULL
stopifnot(all(df_barcodes$barcode_id == df_tisspos_ord$barcode_id))
# ------------------------
# Create SpatialExperiment
# ------------------------
row_data <- df_features
rownames(row_data) <- df_features$gene_id
col_data <- df_barcodes
col_data$sample_id <- "sample_01"
rownames(col_data) <- df_barcodes$barcode_id
spatial_data <- df_tisspos_ord[, c("barcode_id", "in_tissue")]
spatial_data$x <- df_tisspos_ord$pxl_row_in_fullres
y_coord_tmp <- df_tisspos_ord$pxl_col_in_fullres
y_coord_tmp <- (-1 * y_coord_tmp) + min(y_coord_tmp) + max(y_coord_tmp)
spatial_data$y <- y_coord_tmp
rownames(spatial_data) <- df_tisspos_ord$barcode_id
col_data_additional <- df_tisspos_ord[, c("array_row", "array_col", "pxl_col_in_fullres", "pxl_row_in_fullres")]
rownames(col_data_additional) <- df_tisspos_ord$barcode_id
col_data <- cbind(col_data, col_data_additional)
spe <- SpatialExperiment(
assays = list(counts = counts),
rowData = row_data,
colData = col_data,
spatialData = spatial_data
)
Hmm, strange. Will look into it some more.
Looks like:
spe@spatialData
returns as a DataFrame
spatialData(spe)
returns as a data.frame
I'm not sure yet how / why this is the case.
See my first comment... Anyway, there is an error in case the as_df is false it returns a matrix
Ah, I see. Yes, you are right, the getter has as_df = TRUE
as default. I thought the default was FALSE
.
So it was the getter, not the internal structure.
Should we change the getter to as_DF = FALSE
by default? Otherwise people will always get a huge multi-page printout when they use the getter with default settings, since the getter converts the DataFrame
to a data.frame
.
Or we could change the line https://github.com/drighelli/SpatialExperiment/blob/master/R/SpatialExperiment-methods.R#L162
to return(DataFrame(spd))
instead of return(as.data.frame(spd))
E.g. see this example - using spatialData(spe)
to display the object prints thousands of lines to screen, since it is a data.frame
instead of DataFrame
(edit: note I have changed this example to include head()
, so it is smaller now).
So I think changing it to return(DataFrame(spd))
in the getter would solve this.
Addressed in pull request #41 above
Should we store
spatialData
as aDataFrame
instead ofdata.frame
?Then this would be consistent with
rowData
andcolData
.