drighelli / SpatialExperiment

55 stars 20 forks source link

move `read10xVisium` to `TENxIO` #126

Open HelenaLC opened 1 year ago

HelenaLC commented 1 year ago

relates to #114 #116 #119 #123

lmweber commented 1 year ago

Hi @HelenaLC @drighelli , I'm following up here regarding our discussions last week about moving read10xVisium() and the new molecule-based platform loading functions into a new package.

I think my view is this is a good idea, since it will help streamline long-term maintenance and avoids making the SpatialExperiment package too cluttered.

Do you have any further views?

If we proceed with this I would be happy to volunteer to set up the new package and split things out. We could call it SpatialIO or SpatialUtils, or similar.

drighelli commented 1 year ago

Hi @lmweber,

thanks for following up on this, I totally agree with moving these functions to a dedicated package.

I think that @LiNk-NY could also be interested in following this thread, because he already implemented a version of read10xVisium into the TENxIO package.

I also tested the readCosMx function from the https://github.com/drighelli/SpatialExperiment/pull/144 and it works well, but if I remember correctly, it misses the image loading into it. Which is a very useful improvement for this.

For the name of the package, I'd be more inclined to the SpatialIO, because it will be only dedicated to the loading of the data from disk. We can then imagine a set of functions to export such data into a SpatialExperiment or any other class. Otherwise, if we want to stick with the SpatialExperiment class, another idea could be to name the package SpatialExperimentIO.

estellad commented 1 year ago

Hi @drighelli ,

All of my functions in #144 - readXenium, readCosMx, and readMerscope - do not include image loading.

Because these technologies do not really support automated H&E generation and is very different from Visium:

In addition, if you look at Seurat, they also did not include loading images for these technologies in their reader functions, simply because the images are not available.

For the naming of the new package, I think SpatialExperimentReader sounds straightforward 😆 . I am happy with your suggestions on SpatialExperimentIO too 😄 .

Sincerely, Estella

drisso commented 1 year ago

My two cents: I think it's very valuable to have a dedicated IO package specific to spatial data and each different technology (current and future). I would not bind it to SpatialExperiment, as this could be a place to include all IO functions that return a SpatialData, MoleculeExperiment, SpatialFeatureExperiment, etc depending on the needs of the user.

Hence, I think SpatialReader or SpatialIO would be good names, but I would not call it SpatialExperimentIO unless you want it to be specific to SpatialExperiment.

With regards to the images, I think that having a low-res image in the object for plotting purposes could be valuable, but perhaps not vital. And if one needs to be working with images, probably SpatialData is the way to go?

OTOH having access to the cell boundaries and/or to the transcript-level data could be valuable and achieved easily by simply stitching a terra shapefile / geoparquet object in the colData or metadata, similar to what @lgeistlinger has done with the MerfishData package. Could this be an interim solution until we have a release of SpatialData?

estellad commented 1 year ago

Following @drisso 's discussion on access to the cell boundaries and transcript-level data like in the MerfishData package, another idea from Seurat is they used package [sp](https://cran.r-project.org/web/packages/sp/index.html) to store boundaries as sp::Polygons object and molecules as sp::SpatialPoints object.

These additional information would make the SpatialExperiment object very bulky to work with, if we are only interested in cell-level data. However, I would say it is still necessary to implement them, as it might be interesting for the future?

Agree that if images are involved, it's better to go with SpatialData.

drisso commented 1 year ago

But isn't this what SpatialFeatureExperiment is doing?

estellad commented 1 year ago

Seurat, MerfishData, and SpatialFeatureExperiment are all trying to include more information than cell-level SPE, but the boundaries and molecules are stored as different class and at different places in these three.

Seurat::LoadXXX() (stores molecules and boundaries with sp::Polygons and sp::SpatialPoints in their image slot), MerfishData (stores molecules as SplitDataFrameList objects, store polygons in metadata as a long format dataset rather than in list), and SpatialFeatureExperiment (stores boundaries as S3: sfc and sfg POLYGON classes, does not store molecules).

Please correct me if I am wrong. Thank you!

I think Seurat and MerfishData's way are the most straightforward.

LiNk-NY commented 1 year ago

Just to add my two cents, the SpatialIO package should aim to conditionally load dependent packages based on the desired class, e.g,. as Davide mentioned, SpatialData, MoleculeExperiment, SpatialFeatureExperiment, etc. This means adding those packages to the Suggests. The IO package should aim to do minimal read operations. Otherwise, the package would be bloated with all the possible compatible technologies. I considered moving TENxIO::TENxVisium function into it's own package due to the size of the codebase. For now, it is in the devel version of the package. The other option is to create individual packages per class, SpatialExperimentIO in the case that the codebase becomes significantly large especially if there are pre-processing steps done on the data before class generation.

lmweber commented 1 year ago

Hi all, I had a discussion with @estellad today and we would like to propose the following plan to move this forward.

(i) @estellad has set up a new package called SpatialExperimentReader, which contains functions for reading in data from Xenium, MERSCOPE, and CosMx platforms as SpatialExperiment objects. This package will be specific to SpatialExperiment and these platforms (and will not include read10xVisium()) to keep maintenance manageable. We can then call these functions in other packages using Suggests: to avoid duplicating code.

(ii) We will also set up a second package called SpatialIO, which will use Suggests: to provide access to the reader functions from SpatialExperimentReader, as well as read10xVisium() (from either SpatialExperiment or TENxIO), and can also be extended to provide access to reader functions from MoleculeExperiment and/or SpatialFeatureExperiment. The reason for having this package is because we now have several package names (SpatialExperiment, TENxIO, SpatialExperimentReader, ME, SPFE) so it gets confusing to remember which reader function is where, so it is useful to have a single package name to use in vignettes and examples. We will use Suggests: to avoid duplicating code.

Any feedback on this plan? If this sounds good, then @estellad will aim to submit the SpatialExperimentReader package to Bioc soon, and once this is in Bioc-devel I will follow it up with SpatialIO.

A couple of specific questions also:

lgeistlinger commented 1 year ago

Any thoughts on the name SpatialExperimentReader?

Why not SpatialExperimentIO to keep it consistent with some of the other package names (SpatialIO, SpatialDataIO, TENxIO)?

and will not include read10xVisium()

Why not? Doesn't the function read visium data into a SpatialExperiment and would thus fit very well into the SpatialExperimentIO package which has the scope of reading data from different technologies into SpatialExperiments?

another idea from Seurat is they used package sp to store boundaries as sp::Polygons object and molecules as sp::SpatialPoints object.

I can only wonder why Seurat would use sp which is in the process of being replaced by sf (used by SpatialFeatureExperiment), and terra (used by Giotto) is the most performant option among these three. If you consider adding point and polygon representations, I'd recommend using terra::vect(..., type = "points") and terra::vect(..., type = "polygons").

MerfishData stores polygons in metadata

Yes but only for rare cases where we have no 1:1 correspondence between cells and polygons. If there is a 1:1 correspondence, which should pretty much always be the case, it will be better to have this in the colData or spatialCoords to do integrated subsetting of cells and cell metadata that also subsets the polygon boundaries by the selected cells.

These additional information would make the SpatialExperiment object very bulky to work with

You could make reading the images and/or polygons optional via logical arguments to the function to provide for the different use cases (see eg the use.images and use.polygons arguments to MerfishData::MouseIleumPetukhov2021)

lmweber commented 1 year ago

and will not include read10xVisium()

Why not? Doesn't the function read visium data into a SpatialExperiment and would thus fit very well into the SpatialExperimentIO package which has the scope of reading data from different technologies into SpatialExperiments?

This is mainly for ease of maintenance. @estellad has independently developed the reader functions for Xenium, MERSCOPE, and CosMx, and would like to contribute these to Bioconductor, so it would be great for her to keep ownership over these within her own package and take care of the long-term maintenance, without also adding additional maintenance burden by including read10xVisium() within the same package. Hence our proposed solution of SpatialIO as a wrapper package providing access to all of these together via Suggests: to make everything easily accessible for users.

estellad commented 1 year ago

Hi @lgeistlinger,

Yes, I am open to rename the package to SpatialExperimentIO. Another small thing I would add is also to directly return SingleCellExperiment class and store the spatial coordinates in colData. It will be helpful in many workflows, especially nowadays for single-cell resolution spatial data.

Thank you for your suggestions on storing polygon with terra and in colData or spatialCoords. And yes, we can use logical arguments to decide what level of details to load. I think for the first version of SpatialExperimentIO, I will just keep it simple, loading only cell-level counts without molecules or polygons. This simplification will speed up other dependencies. Gradually, I will look more into MerfishData's implementation to make SPE more complete.

Another ExperimentHub package I am working on is the Xenium preprint breast cancer data (contains scRNAseq, Visium, two Xenium replicates) from consecutive slices of the same tumour. Here could be a good chance to standardize Xenium SPE storage as Merfish. (However, in the first version I would also just stay at the cell-level count matrix without molecules or polygons)

And hopefully, we can work on these together? :)

drisso commented 1 year ago

Thanks @estellad and @lmweber !

I think this is a good plan. I second @lgeistlinger 's suggestion for the package name and for storing (optionally) polygons as terra objects in the colData.

This is a bit off-topic here, but we're also using the Xenium preprint breast cancer data for benchmarking purposes and I thought that having an ExperimentHub package with those data would simplify data reuse (probably one package with four separate objects). This could be eventually updated to use SpatialData, but we could start simple as you mentioned.

Definitely happy to could work together on such a data package if you're interested.

drighelli commented 1 year ago

Thanks @lmweber and @estellad,

I also agree with the plan, I also share @lgeistlinger thoughts on the name of the package and on terra's implementation.

another small thing I would add is also to directly return SingleCellExperiment class and store the spatial coordinates in colData. It will be helpful in many workflows, especially nowadays for single-cell resolution spatial data.

About this, I don't agree with that, I'd be more inclined to modify the SpatialExperiment by storing the spatialCoords inside the colData and retrieving them with the spatialCoords accessor. Also because several technologies have more than one reference as spatial coordinates (global/local), it would be nice to properly handle this.

Another ExperimentHub package I am working on is the Xenium preprint breast cancer data (contains scRNAseq, Visium, two Xenium replicates) from consecutive slices of the same tumour. Here could be a good chance to standardize Xenium SPE storage as Merfish. (However, in the first version I would also just stay at the cell-level count matrix without molecules or polygons) And hopefully, we can work on these together? :)

As @drisso mentioned, we were doing the same, so I think it would be nice to work together to provide such a package. But it would be better to move the discussion somewhere else. Maybe, we can get in touch on the Bioc slack and discuss it further.

Dario

estellad commented 1 year ago

Thanks @drisso and @drighelli . I have messaged you on Bioconductor slack for the Xenium preprint ExperimentHub package. We had a call with 10x Genomics yesterday, and they agreed with using the Zenodo link (I shared with you in the Slack channel) as an alternative link to their company website.

@drighelli, I agree with you about "flattening" SPE to a structure similar to SCE, by having spatialCoords() inside colData. How long do you think you can finish the implementation of the change? If such a change is made, some other packages like ggspavis and the new SpatialExperimentIO will also need to be altered.

So meanwhile waiting for the update, I would proceed with having the user choose either to return SPE or SCE in SpatialExperimentIO and putting it on Bioc first. Also for ggspavis, I have made some major updates to be reviewed by @lmweber . I will make ggspavis take both SCE and SPE for now. These "SPE-or-SCE" choice statements can be easily modified after your structural update on SpatialExperiment. What do you think?

Estella

LiNk-NY commented 11 months ago

Update: I've moved the import functionality to VisiumIO and will submit to Bioconductor soon. Feel free to test and see the examples in the README.md.

LiNk-NY commented 9 months ago

Update: VisiumIO is currently in Bioconductor devel : https://bioconductor.org/packages/VisiumIO

drighelli commented 8 months ago

Thanks Marcel and sorry for the late response!

lmweber commented 8 months ago

Thank you!

HelenaLC commented 4 months ago

@lmweber @drighelli picking up on this now that we passed another release cycle - shall we at long last consider dropping read10xVisium from the SPE package? Perhaps deprecating it for a cycle with appropriate warnings, and removing it for Bioc 3.20 or .21? I think we'd still depend on magick for image handling, but at least the code base/maintenance would slim down a bit, and remove redundancy/confusion between packages in the future...

drighelli commented 4 months ago

I totally agree with it @HelenaLC !

lmweber commented 4 months ago

Yes, I think this makes sense, thanks for following up @HelenaLC !

HelenaLC commented 4 months ago

Related - we have deprecated spatialData for some time now; so I would suggest removing it for good (again, this cuts down on various pieces on documentation and unit tests). If all are in favor, I would work on a separate PR for this & we can push both (deprecation of read10xVisium & removal of spatialData-related code); perhaps with a single new version as to avoid too many bumps?

lmweber commented 4 months ago

Yes, spatialData has been removed for several cycles now, so this sounds good to me, i.e. removing the code and warning messages completely.

I think the version bumps is ok either way, either one or multiple, whichever is more convenient for you while you are writing the code. (I usually add the version bump in a separate commit at the end that affects the DESCRIPTION file only, since this makes it easier to avoid merge conflicts in DESCRIPTION later if we need to extend the commits or cherry-pick across branches.)

kevinrue commented 2 months ago

Just my two cents that running a course today, we ran into the same old issue of 'read10xVisiumI()addingouts/` to file paths which doesn't play well with the way we organised data for the course. Looking forward to the deprecation-etc steps described above. Is the plan to deprecate after the coming release?

HelenaLC commented 2 months ago

It's already been marked as deprecated in devel (i.e., giving a warning); I guess we'll keep it around for another cycle or so as to not break too many people's code. I'm confused with the outs/ issue though. I thought that this had taken care of it a while ago?

kevinrue commented 2 months ago

I just realised that I haven't checked package versions. It's on our teaching cluster, so definitely not using devel, but for stability we also don't update package versions for long periods of time. Could be that we're simply behind the curve. Thanks for the clarification!

On Fri, 13 Sept 2024 at 08:28, Helena L. Crowell @.***> wrote:

It's already been marked as deprecated in devel (i.e., giving a warning); I guess we'll keep it around for another cycle or so as to not break too many people's code. I'm confused with the outs/ issue, though. Thought this https://github.com/drighelli/SpatialExperiment/blob/0e6fcadab3f235eab9615c976429cbb4ea653e53/R/read10xVisium.R#L119 had taken care of it a while ago?

— Reply to this email directly, view it on GitHub https://github.com/drighelli/SpatialExperiment/issues/126#issuecomment-2348229597, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMAPFEHCIB2DXSKYY36HZTZWKH25AVCNFSM6AAAAAASDL5IU2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBYGIZDSNJZG4 . You are receiving this because you commented.Message ID: @.***>