EliHei2 / segger_dev

a cutting-edge cell segmentation model specifically designed for single-molecule resolved spatial omics datasets. It addresses the challenge of accurately segmenting individual cells in complex imaging datasets, leveraging a unique approach based on graph neural networks (GNNs).
https://elihei2.github.io/segger_dev/
MIT License
33 stars 3 forks source link

seg2explorer arguments #45

Open quail768 opened 2 days ago

quail768 commented 2 days ago

I'd like to see my data in the xeniumexplorer since you guys have already made the function(very cool!) Is the function working or is it still in development?

Couple of questions:

What is the seg_df argment for the seg2explorer function? Is it just the anndata file? Is the source_path argument for seg2explorer the initial xenium experiment directory before segmentation with segger?

Thanks!

EliHei2 commented 2 days ago

Hi @quail768, thanks for making a deep dive into segger :) seg2explorer currently works on the previous version of file formats which is .csv.gz, I'm gonna update it for parquet files and notify you, asap. In the meanwhile, you might wanna look into https://github.com/quentinblampey/spatialdata_xenium_explorer from @quentinblampey.

quail768 commented 2 days ago

Gotcha thanks @EliHei2 !

Silly question probably but the function from spatialdata_xenium_explorer reqires a spatialdata object, the segmentation returns csv, anndata or parquet. Any tips on getting either of them ready for running spatialdata_xenium_explorer.write()?

Unrelated but maybe you'd also want to take a look at this: for find_markers()

 ValueError                                Traceback (most recent call last)
Cell In[59], line 1
----> 1 annotated=find_markers(annotated, "annotation", pos_percentile=5, neg_percentile=10, percentage=50)

File /omics/odcf/analysis/OE0211_projects/ndmm/Aaron/Binaries/ubuntu/segger_dev/src/segger/validation/utils.py:61, in find_markers(adata, cell_type_column, pos_percentile, neg_percentile, percentage)
     59 expr_frac = np.asarray((subset.X[:, pos_indices] > 0).mean(axis=0))[0]
     60 valid_pos_indices = pos_indices[expr_frac >= (percentage / 100)]
---> 61 positive_markers = genes[valid_pos_indices]
     62 negative_markers = genes[neg_indices]
     63 markers[cell_type] = {"positive": list(positive_markers), "negative": list(negative_markers)}

File /omics/odcf/analysis/OE0211_projects/ndmm/Aaron/CondaEnvironments/segger/lib/python3.10/site-packages/pandas/core/indexes/base.py:5419, in Index.__getitem__(self, key)
   5417 # Because we ruled out integer above, we always get an arraylike here
   5418 if result.ndim > 1:
-> 5419     disallow_ndim_indexing(result)
   5421 # NB: Using _constructor._simple_new would break if MultiIndex
   5422 #  didn't override __getitem__
   5423 return self._constructor._simple_new(result, name=self._name)

File /omics/odcf/analysis/OE0211_projects/ndmm/Aaron/CondaEnvironments/segger/lib/python3.10/site-packages/pandas/core/indexers/utils.py:341, in disallow_ndim_indexing(result)
    333 """
    334 Helper function to disallow multi-dimensional indexing on 1D Series/Index.
    335 
   (...)
    338 in GH#30588.
    339 """
    340 if np.ndim(result) > 1:
--> 341     raise ValueError(
    342         "Multi-dimensional indexing (e.g. `obj[:, None]`) is no longer "
    343         "supported. Convert to a numpy array before indexing instead."
    344     )

ValueError: Multi-dimensional indexing (e.g. `obj[:, None]`) is no longer supported. Convert to a numpy array before indexing instead.

I replaced expr_frac = np.asarray((subset.X[:, pos_indices] > 0).mean(axis=0))[0] with expr_frac = np.asarray((subset.X[:, pos_indices] > 0).mean(axis=0)).flatten() and it seems to work

EliHei2 commented 2 days ago

@quail768 you're right next versions will include spatialdata output with help from @LucaMarconato. give us a couple of days to write the right funciton for the xenium explorer. re find_markers great catch! would you like to make a PR for the fix?

quail768 commented 2 days ago

Yep no worries. Just wanted to check to see if it was possible with the output from the segmentation.

Made the change :)