drighelli / SpatialExperiment

55 stars 19 forks source link

Possibility of adding `in_filtered` to read10xVisium(data = "raw") #145

Closed lcolladotor closed 2 weeks ago

lcolladotor commented 1 year ago

Hi,

We noticed an issue between reading the raw and filtered outputs from SpaceRanger that I'm not sure whether read10xVisium() could address or not.

I incorrectly thought that the only difference between raw and filtered was that filtered was the subset for raw when in_tissue is TRUE. That way, if you read in the raw data, you can always get the filtered data too by using the in_tissue variable.

However, from https://support.10xgenomics.com/spatial-gene-expression/software/pipelines/latest/output/matrices, raw also includes other background spots that could be in_tissue = TRUE. For example spots from small holes in the tissue not close to the edge.

One solution would be to read in the filtered barcodes when reading in the raw ones, then add a column like in_filtered, specifying if the barcode is in the filtered version or not.

filtered_feature_bc_matrix
├── barcodes.tsv.gz

Doing so though would mean that users would have to have both the filtered and the raw barcode files. I know that most people don't share both sets of barcodes files, and likely don't even keep both sets of them. We do keep both ourselves, but maybe we are in the minority. That's why I'm not sure whether this issue can be addressed by SpatialExperiment::read10xVisium() or not.

Also maybe this belongs in DropletUtils given https://github.com/drighelli/SpatialExperiment/blob/bb81804decd9cdbe93e436588ab8c8792b5a3b8d/R/read10xVisium.R#L167?

Best, Leo

With info reported by @Nick-Eagles and @prashanthi-ravichandran

LiNk-NY commented 11 months ago

Hi Leo, @lcolladotor This sounds like it could be handled by a helper function. I imagine that most have access to or read raw or filtered data and not both. I take care of this with the processing = c("filtered", "raw") argument in VisiumIO::TENxVisiumList. The helper function could attempt to look at the barcodes in the other 'processing' type and add those annotations to some part of the object (I'm not sure where yet). Best, Marcel

LiNk-NY commented 10 months ago

@lcolladotor I made a first attempt of this in the in_filtered branch here https://github.com/waldronlab/VisiumIO/tree/in_filtered It currently just returns a logical vector of raw %in% filtered

update: It currently provides a data.frame with barcodes and a logical vector denoting whether those barcodes are in the other dataset e.g., cbind.data.frame(raw_barcodes, in_filtered = 'raw %in% filtered')

lcolladotor commented 3 months ago

Thanks Marcel!

LiNk-NY commented 3 months ago

Hi Leo! @lcolladotor

Have you taken a look at the in_filtered branch? Does the output work for you? I am considering moving it to the devel branch soon. Let me know if it satisfies the use case. Thank you!

Best, Marcel

lcolladotor commented 1 month ago

Hi @Nick-Eagles,

Can you take a look at this branch?

Thanks, Leo

Nick-Eagles commented 3 weeks ago

Thanks Marcel! I think adding this functionality as you've done under VisiumIO makes a lot of sense. For our use case, we'd likely invoke spe = read10xVisium(data = "raw") followed by compareBarcodes(), and merge the latter's result into spe$in_filtered, which I think works well. Thanks again for your work here; this could definitely be helpful in the devel branch.

lcolladotor commented 3 weeks ago

Thanks for checking this @Nick-Eagles! And thanks again @LiNk-NY!