QuKunLab / SpatialBenchmarking

BSD 2-Clause "Simplified" License
107 stars 26 forks source link

It seems the 32 simulated spatial datasets are not so "spatial" #16

Open hurazh opened 1 year ago

hurazh commented 1 year ago

Hi, Thank you for the benchmarking work of putting most of the existing methods together and providing a quantitative comparison of performances. When I took a closer look at the 32 simulated spatial datasets (both the paper and the data you provided), I found there seem no spatial structures in these simulated spatial datasets, as the description in 'Methods' seems to treat each spot independently and no spatial coordinates for the spots are provided in the data (obs_names are just 0-999 and no Locations file are provided for the simulated datasets). The way you ran RCTD confirmed my suspicion, as in

coords <- data.frame(colnames(spatial_obj))
colnames(coords) <- 'barcodes'
coords$xcoord <- seq_along(colnames(spatial_obj))
coords$ycoord <- seq_along(colnames(spatial_obj))
rownames(coords) <- coords$barcodes; coords$barcodes <- NULL # Move barcodes to rownames

where you explicitly let the coords be {(1,1), (2,2), (3,3) ...} by seq_along such that all the spots lie along a straight line instead of some 2D region like real tissue samples. As I understand it, spatial structure is crucial for spatial transcriptomics data. It is not only an annotation of the location of each spot but also provides some meaningful information that many existing methods, like RCTD, build upon. I'm curious if this is intentional. If so, what's its purpose, and how are these simulated datasets "spatial"?