a cutting-edge cell segmentation model specifically designed for single-molecule resolved spatial omics datasets. It addresses the challenge of accurately segmenting individual cells in complex imaging datasets, leveraging a unique approach based on graph neural networks (GNNs).
Describing use cases and a possible strategy to enable spatialdata support.
Use cases
These use cases can be considered as incremental goals, to accomplish in this order:
flexibility to work with processed Xenium and MERSCOPE data and not just raw data (having them stored as SpatialData Zarr object)
extend segger to new transcripts-based data types (e.g. seqFISH)
extend segger to bins-based data types (e.g. Visium HD, Stereo-seq, Open-ST)
enable napari-spatialdata visualization when Xenium explorer is not available (=non-Xenium data)
enable spatialdata-based tools like bento-tools and sopa.
Method
Numbers correspond to the above list and they depend on each other as follows: 1 -> 4 and 1 -> 2 -> 3 -> 5.
add a new subclass to SpatialTranscriptomicsSample that accepts SpatialData objects and reproduce the Xenium + MERSCOPE support. The subclass will reimplement some methods of the base class but keep API compatible with the segger pipeline
analogous as above for the STSampleParquet class
test segger on a new transcripts-based technology
test segger on Visium HD data; will require some modification of the Visium HD data to make it look like a transcript-based data. Doing it once will make it work for each bins-based tech thanks to the SpatialData abstraction.
create a parser for produced results into a new SpaitalData object (or put the predictions into the original one)
test napari-spatialdata on the newly created object.
detailed lists of tasks
segger.data
[ ] data/parquet/_settings/xenium.yaml: not sure about this file.
[ ] data/parquet/_utils.py: a quick way to enabled spatialdata support is to keep using these functions (even if they are mostly implemented in spatialdata) by making the class SpatialTranscriptomicsSample (or STSampleParquet?) return the right paths of .parquet files inside the SpatialData .zarr store. This means that in-memory SpatialData objects would not be supported, at least in the beginning. In this case, this file does not require modifications.
[ ] sample.py: similar comment to the above. I would create an helper function that takes a SpatialData object (stored on disk) and creates a STSampleParquet by passing the right paths. In this way STSampleParquet doesn't need to know that it comes from a SpatialData object. Some of the functions in sample.py are already available in the spatialdata package.
[ ] segger.data.__init__: include new classes and functions developed for the files above.
[ ] segger.data.contants no need to add SpatialData specific files
[ ] segger.data.io: we can subclass SpatialTranscriptomicsSample to support generic SpatialData objects so that we can reuse SpatialTranscriptomicsSample without having to adapt all the code. Note that some of the functions implemented SpatialTranscriptomicsSample are available in the spatialdata package.
[ ] README.md udpate to include info on the support of SpatialData
segger.cli
[ ] add a new .yaml configuration in segger.cli.configs.create_dataset based on a generic SpatialData .zarr file
[ ] create_dataset.py: add a new dataset_typespatialdata-zarr, which uses a new SpatialDataSample class
segger.models
no changes needed
segger.prediction
no changes needed
segger.training
no changed needed
segger.validation
[ ] add export function to a new SpatialData object (or make possible to extend the original one if the data came from a SpatialData object; it's easy to just create a new one)
[ ] test an example with napari-spatialdata
additional comments
[ ] for simplicity, one can start with data that does not require considering the coordinate transformation.
spatialdata
support insegger
Describing use cases and a possible strategy to enable
spatialdata
support.Use cases
These use cases can be considered as incremental goals, to accomplish in this order:
segger
to new transcripts-based data types (e.g. seqFISH)segger
to bins-based data types (e.g. Visium HD, Stereo-seq, Open-ST)napari-spatialdata
visualization when Xenium explorer is not available (=non-Xenium data)spatialdata
-based tools likebento-tools
andsopa
.Method
Numbers correspond to the above list and they depend on each other as follows: 1 -> 4 and 1 -> 2 -> 3 -> 5.
SpatialTranscriptomicsSample
that acceptsSpatialData
objects and reproduce theXenium
+MERSCOPE
support. The subclass will reimplement some methods of the base class but keep API compatible with thesegger
pipelineSTSampleParquet
classSpatialData
abstraction.SpaitalData
object (or put the predictions into the original one)napari-spatialdata
on the newly created object.detailed lists of tasks
segger.data
data/parquet/_settings/xenium.yaml
: not sure about this file.data/parquet/_utils.py
: a quick way to enabledspatialdata
support is to keep using these functions (even if they are mostly implemented inspatialdata
) by making the classSpatialTranscriptomicsSample
(orSTSampleParquet
?) return the right paths of.parquet
files inside the SpatialData.zarr
store. This means that in-memorySpatialData
objects would not be supported, at least in the beginning. In this case, this file does not require modifications.sample.py
: similar comment to the above. I would create an helper function that takes aSpatialData
object (stored on disk) and creates aSTSampleParquet
by passing the right paths. In this waySTSampleParquet
doesn't need to know that it comes from aSpatialData
object. Some of the functions insample.py
are already available in thespatialdata
package.segger.data.__init__
: include new classes and functions developed for the files above.segger.data.contants
no need to addSpatialData
specific filessegger.data.io
: we can subclassSpatialTranscriptomicsSample
to support genericSpatialData
objects so that we can reuseSpatialTranscriptomicsSample
without having to adapt all the code. Note that some of the functions implementedSpatialTranscriptomicsSample
are available in thespatialdata
package.README.md
udpate to include info on the support ofSpatialData
segger.cli
.yaml
configuration insegger.cli.configs.create_dataset
based on a generic SpatialData.zarr
filecreate_dataset.py
: add a newdataset_type
spatialdata-zarr
, which uses a newSpatialDataSample
classsegger.models
segger.prediction
segger.training
segger.validation
SpatialData
object (or make possible to extend the original one if the data came from aSpatialData
object; it's easy to just create a new one)napari-spatialdata
additional comments