EliHei2 / segger_dev

a cutting-edge cell segmentation model specifically designed for single-molecule resolved spatial omics datasets. It addresses the challenge of accurately segmenting individual cells in complex imaging datasets, leveraging a unique approach based on graph neural networks (GNNs).
https://elihei2.github.io/segger_dev/
MIT License
38 stars 3 forks source link

Connect Sopa and Segger #8

Closed quentinblampey closed 1 month ago

quentinblampey commented 1 month ago

Here are the tasks to connect segger with sopa. FYI @LucaMarconato

1. Raw data to SpatialData

This is handled by spatialdata-io, and does not directly concern sopa or segger

2. SpatialData to segger input

Three options:

  1. add a function inside sopa to create inputs for segger
  2. add a function in segger that takes as input a dask dataframe (points) and a geopandas dataframe (cells). This way, SpatialData is not a dependency of segger, and Sopa makes the link between the two.
  3. add a function in segger to read from a zarr store representing a SpatialData object

I think solution 2 is better. For instance, with Sopa, we store a SpatialData object on disk, and then we could call the function below from segger, and we provide the objects that are needed for segger (in the right coordinate system).

# inside segger
def from_spatialdata(transcripts: dd.DataFrame, cells: gpd.GeoDataFrame):
     ... # the transcripts and cells provided by Sopa should live in the same coordinate system

# inside Sopa
import segger

def segger_segmentation(sdata: SpatialData):
    segger.from_spatialdata(sdata[points_key], sdata[shapes_key])
    ...

The toy dataset from Sopa might be useful to work on this:

import sopa

sdata = sopa.io.uniform()
sdata.write("toy_data.zarr") # store the SpatialData object on disk

sdata["transcripts"] # dask dataframe with columns "x", "y", "z", "genes"
sdata["cells"] # geopandas dataframe, with column "geometry"

# shapes can be read with geopandas, without using spatialdata
import geopandas as gpd
gpd.read_parquet(sdata.path / "shapes" / "cells" / "shapes.parquet")

3. Segger output to Sopa

We will need two things:

LucaMarconato commented 1 month ago

Thanks @quentinblampey for the explanation, I agree that plan 2 is the shortest path forward and should be pretty quick to implement.

EliHei2 commented 1 month ago

20 is gonna be a continuation of this, as once there's a bridge to SpatialData, we have the bridge to sopa. @LucaMarconato @rukhovich @andrewmoorman and @quentinblampey to further continue the discussions. I close this one.