Open HelenaLC opened 1 year ago
Hi @HelenaLC @drighelli , I'm following up here regarding our discussions last week about moving read10xVisium()
and the new molecule-based platform loading functions into a new package.
I think my view is this is a good idea, since it will help streamline long-term maintenance and avoids making the SpatialExperiment
package too cluttered.
Do you have any further views?
If we proceed with this I would be happy to volunteer to set up the new package and split things out. We could call it SpatialIO
or SpatialUtils
, or similar.
Hi @lmweber,
thanks for following up on this, I totally agree with moving these functions to a dedicated package.
I think that @LiNk-NY could also be interested in following this thread, because he already implemented a version of read10xVisium
into the TENxIO
package.
I also tested the readCosMx
function from the https://github.com/drighelli/SpatialExperiment/pull/144 and it works well, but if I remember correctly, it misses the image loading into it. Which is a very useful improvement for this.
For the name of the package, I'd be more inclined to the SpatialIO
, because it will be only dedicated to the loading of the data from disk. We can then imagine a set of functions to export such data into a SpatialExperiment
or any other class.
Otherwise, if we want to stick with the SpatialExperiment
class, another idea could be to name the package SpatialExperimentIO
.
Hi @drighelli ,
All of my functions in #144 - readXenium
, readCosMx
, and readMerscope
- do not include image loading.
Because these technologies do not really support automated H&E generation and is very different from Visium:
In addition, if you look at Seurat, they also did not include loading images for these technologies in their reader functions, simply because the images are not available.
For the naming of the new package, I think SpatialExperimentReader sounds straightforward 😆 . I am happy with your suggestions on SpatialExperimentIO too 😄 .
Sincerely, Estella
My two cents: I think it's very valuable to have a dedicated IO package specific to spatial data and each different technology (current and future). I would not bind it to SpatialExperiment, as this could be a place to include all IO functions that return a SpatialData, MoleculeExperiment, SpatialFeatureExperiment, etc depending on the needs of the user.
Hence, I think SpatialReader or SpatialIO would be good names, but I would not call it SpatialExperimentIO unless you want it to be specific to SpatialExperiment.
With regards to the images, I think that having a low-res image in the object for plotting purposes could be valuable, but perhaps not vital. And if one needs to be working with images, probably SpatialData is the way to go?
OTOH having access to the cell boundaries and/or to the transcript-level data could be valuable and achieved easily by simply stitching a terra shapefile / geoparquet object in the colData or metadata, similar to what @lgeistlinger has done with the MerfishData package. Could this be an interim solution until we have a release of SpatialData?
Following @drisso 's discussion on access to the cell boundaries and transcript-level data like in the MerfishData
package, another idea from Seurat is they used package [sp](https://cran.r-project.org/web/packages/sp/index.html)
to store boundaries as sp::Polygons object and molecules as sp::SpatialPoints object.
These additional information would make the SpatialExperiment object very bulky to work with, if we are only interested in cell-level data. However, I would say it is still necessary to implement them, as it might be interesting for the future?
Agree that if images are involved, it's better to go with SpatialData.
But isn't this what SpatialFeatureExperiment is doing?
Seurat, MerfishData, and SpatialFeatureExperiment are all trying to include more information than cell-level SPE, but the boundaries and molecules are stored as different class and at different places in these three.
Seurat::LoadXXX() (stores molecules and boundaries with sp::Polygons and sp::SpatialPoints in their image slot), MerfishData (stores molecules as SplitDataFrameList objects, store polygons in metadata as a long format dataset rather than in list), and SpatialFeatureExperiment (stores boundaries as S3: sfc and sfg POLYGON classes, does not store molecules).
Please correct me if I am wrong. Thank you!
I think Seurat and MerfishData's way are the most straightforward.
Just to add my two cents, the SpatialIO
package should aim to conditionally load dependent packages based on the desired class, e.g,. as Davide mentioned, SpatialData
, MoleculeExperiment
, SpatialFeatureExperiment
, etc. This means adding those packages to the Suggests
. The IO package should aim to do minimal read operations.
Otherwise, the package would be bloated with all the possible compatible technologies.
I considered moving TENxIO::TENxVisium
function into it's own package due to the size of the codebase. For now, it is in the devel
version of the package.
The other option is to create individual packages per class, SpatialExperimentIO
in the case that the codebase becomes significantly large especially if there are pre-processing steps done on the data before class generation.
Hi all, I had a discussion with @estellad today and we would like to propose the following plan to move this forward.
(i) @estellad has set up a new package called SpatialExperimentReader
, which contains functions for reading in data from Xenium, MERSCOPE, and CosMx platforms as SpatialExperiment
objects. This package will be specific to SpatialExperiment
and these platforms (and will not include read10xVisium()
) to keep maintenance manageable. We can then call these functions in other packages using Suggests:
to avoid duplicating code.
(ii) We will also set up a second package called SpatialIO
, which will use Suggests:
to provide access to the reader functions from SpatialExperimentReader
, as well as read10xVisium()
(from either SpatialExperiment
or TENxIO
), and can also be extended to provide access to reader functions from MoleculeExperiment
and/or SpatialFeatureExperiment
. The reason for having this package is because we now have several package names (SpatialExperiment
, TENxIO
, SpatialExperimentReader
, ME
, SPFE
) so it gets confusing to remember which reader function is where, so it is useful to have a single package name to use in vignettes and examples. We will use Suggests:
to avoid duplicating code.
Any feedback on this plan? If this sounds good, then @estellad will aim to submit the SpatialExperimentReader
package to Bioc soon, and once this is in Bioc-devel I will follow it up with SpatialIO
.
A couple of specific questions also:
Any thoughts on the name SpatialExperimentReader
? This package contains functions for Xenium, MERSCOPE, and CosMx, so maybe something with "molecule-based" would also work and be more specific.
@LiNk-NY is the version of read10xVisium()
currently in TENxIO
the same as the one in SpatialExperiment
? If so we could remove it from SpatialExperiment
now. Alternatively would you prefer we just split it out as a separate small package (e.g. VisiumIO
) and put this in Suggests:
in SpatialExperiment
and TENxIO
(as you mentioned above)?
Any thoughts on the name
SpatialExperimentReader
?
Why not SpatialExperimentIO
to keep it consistent with some of the other package names (SpatialIO
, SpatialDataIO
, TENxIO
)?
and will not include
read10xVisium()
Why not? Doesn't the function read visium data into a SpatialExperiment
and would thus fit very well into the SpatialExperimentIO
package which has the scope of reading data from different technologies into SpatialExperiment
s?
another idea from Seurat is they used package sp to store boundaries as
sp::Polygons
object and molecules assp::SpatialPoints
object.
I can only wonder why Seurat would use sp
which is in the process of being replaced by sf
(used by SpatialFeatureExperiment), and terra
(used by Giotto) is the most performant option among these three. If you consider adding point and polygon representations, I'd recommend using terra::vect(..., type = "points")
and terra::vect(..., type = "polygons")
.
MerfishData stores polygons in metadata
Yes but only for rare cases where we have no 1:1 correspondence between cells and polygons. If there is a 1:1 correspondence, which should pretty much always be the case, it will be better to have this in the colData
or spatialCoords
to do integrated subsetting of cells and cell metadata that also subsets the polygon boundaries by the selected cells.
These additional information would make the SpatialExperiment object very bulky to work with
You could make reading the images and/or polygons optional via logical arguments to the function to provide for the different use cases (see eg the use.images
and use.polygons
arguments to MerfishData::MouseIleumPetukhov2021
)
and will not include read10xVisium()
Why not? Doesn't the function read visium data into a SpatialExperiment and would thus fit very well into the SpatialExperimentIO package which has the scope of reading data from different technologies into SpatialExperiments?
This is mainly for ease of maintenance. @estellad has independently developed the reader functions for Xenium, MERSCOPE, and CosMx, and would like to contribute these to Bioconductor, so it would be great for her to keep ownership over these within her own package and take care of the long-term maintenance, without also adding additional maintenance burden by including read10xVisium()
within the same package. Hence our proposed solution of SpatialIO
as a wrapper package providing access to all of these together via Suggests:
to make everything easily accessible for users.
Hi @lgeistlinger,
Yes, I am open to rename the package to SpatialExperimentIO
. Another small thing I would add is also to directly return SingleCellExperiment
class and store the spatial coordinates in colData
. It will be helpful in many workflows, especially nowadays for single-cell resolution spatial data.
Thank you for your suggestions on storing polygon with terra
and in colData
or spatialCoords
. And yes, we can use logical arguments to decide what level of details to load. I think for the first version of SpatialExperimentIO
, I will just keep it simple, loading only cell-level counts without molecules or polygons. This simplification will speed up other dependencies. Gradually, I will look more into MerfishData
's implementation to make SPE more complete.
Another ExperimentHub package I am working on is the Xenium preprint breast cancer data (contains scRNAseq, Visium, two Xenium replicates) from consecutive slices of the same tumour. Here could be a good chance to standardize Xenium SPE storage as Merfish. (However, in the first version I would also just stay at the cell-level count matrix without molecules or polygons)
And hopefully, we can work on these together? :)
Thanks @estellad and @lmweber !
I think this is a good plan. I second @lgeistlinger 's suggestion for the package name and for storing (optionally) polygons as terra objects in the colData.
This is a bit off-topic here, but we're also using the Xenium preprint breast cancer data for benchmarking purposes and I thought that having an ExperimentHub package with those data would simplify data reuse (probably one package with four separate objects). This could be eventually updated to use SpatialData, but we could start simple as you mentioned.
Definitely happy to could work together on such a data package if you're interested.
Thanks @lmweber and @estellad,
I also agree with the plan, I also share @lgeistlinger thoughts on the name of the package and on terra's implementation.
another small thing I would add is also to directly return SingleCellExperiment class and store the spatial coordinates in colData. It will be helpful in many workflows, especially nowadays for single-cell resolution spatial data.
About this, I don't agree with that, I'd be more inclined to modify the SpatialExperiment
by storing the spatialCoords
inside the colData
and retrieving them with the spatialCoords
accessor.
Also because several technologies have more than one reference as spatial coordinates (global/local), it would be nice to properly handle this.
Another ExperimentHub package I am working on is the Xenium preprint breast cancer data (contains scRNAseq, Visium, two Xenium replicates) from consecutive slices of the same tumour. Here could be a good chance to standardize Xenium SPE storage as Merfish. (However, in the first version I would also just stay at the cell-level count matrix without molecules or polygons) And hopefully, we can work on these together? :)
As @drisso mentioned, we were doing the same, so I think it would be nice to work together to provide such a package. But it would be better to move the discussion somewhere else. Maybe, we can get in touch on the Bioc slack and discuss it further.
Dario
Thanks @drisso and @drighelli . I have messaged you on Bioconductor slack for the Xenium preprint ExperimentHub package. We had a call with 10x Genomics yesterday, and they agreed with using the Zenodo link (I shared with you in the Slack channel) as an alternative link to their company website.
@drighelli, I agree with you about "flattening" SPE to a structure similar to SCE, by having spatialCoords()
inside colData
. How long do you think you can finish the implementation of the change? If such a change is made, some other packages like ggspavis
and the new SpatialExperimentIO
will also need to be altered.
So meanwhile waiting for the update, I would proceed with having the user choose either to return SPE or SCE in SpatialExperimentIO
and putting it on Bioc first. Also for ggspavis
, I have made some major updates to be reviewed by @lmweber . I will make ggspavis
take both SCE and SPE for now. These "SPE-or-SCE" choice statements can be easily modified after your structural update on SpatialExperiment
. What do you think?
Estella
Update: I've moved the import functionality to VisiumIO and will submit to Bioconductor soon. Feel free to test and see the examples in the README.md
.
Update: VisiumIO
is currently in Bioconductor devel : https://bioconductor.org/packages/VisiumIO
Thanks Marcel and sorry for the late response!
Thank you!
@lmweber @drighelli picking up on this now that we passed another release cycle - shall we at long last consider dropping read10xVisium
from the SPE package? Perhaps deprecating it for a cycle with appropriate warnings, and removing it for Bioc 3.20 or .21? I think we'd still depend on magick
for image handling, but at least the code base/maintenance would slim down a bit, and remove redundancy/confusion between packages in the future...
I totally agree with it @HelenaLC !
Yes, I think this makes sense, thanks for following up @HelenaLC !
Related - we have deprecated spatialData
for some time now; so I would suggest removing it for good (again, this cuts down on various pieces on documentation and unit tests). If all are in favor, I would work on a separate PR for this & we can push both (deprecation of read10xVisium
& removal of spatialData
-related code); perhaps with a single new version as to avoid too many bumps?
Yes, spatialData
has been removed for several cycles now, so this sounds good to me, i.e. removing the code and warning messages completely.
I think the version bumps is ok either way, either one or multiple, whichever is more convenient for you while you are writing the code. (I usually add the version bump in a separate commit at the end that affects the DESCRIPTION
file only, since this makes it easier to avoid merge conflicts in DESCRIPTION
later if we need to extend the commits or cherry-pick across branches.)
Just my two cents that running a course today, we ran into the same old issue of 'read10xVisiumI()adding
outs/` to file paths which doesn't play well with the way we organised data for the course.
Looking forward to the deprecation-etc steps described above. Is the plan to deprecate after the coming release?
It's already been marked as deprecated in devel (i.e., giving a warning); I guess we'll keep it around for another cycle or so as to not break too many people's code. I'm confused with the outs/
issue though. I thought that this had taken care of it a while ago?
I just realised that I haven't checked package versions. It's on our teaching cluster, so definitely not using devel, but for stability we also don't update package versions for long periods of time. Could be that we're simply behind the curve. Thanks for the clarification!
On Fri, 13 Sept 2024 at 08:28, Helena L. Crowell @.***> wrote:
It's already been marked as deprecated in devel (i.e., giving a warning); I guess we'll keep it around for another cycle or so as to not break too many people's code. I'm confused with the outs/ issue, though. Thought this https://github.com/drighelli/SpatialExperiment/blob/0e6fcadab3f235eab9615c976429cbb4ea653e53/R/read10xVisium.R#L119 had taken care of it a while ago?
— Reply to this email directly, view it on GitHub https://github.com/drighelli/SpatialExperiment/issues/126#issuecomment-2348229597, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMAPFEHCIB2DXSKYY36HZTZWKH25AVCNFSM6AAAAAASDL5IU2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBYGIZDSNJZG4 . You are receiving this because you commented.Message ID: @.***>
relates to #114 #116 #119 #123