EliHei2 / segger_dev

a cutting-edge cell segmentation model specifically designed for single-molecule resolved spatial omics datasets. It addresses the challenge of accurately segmenting individual cells in complex imaging datasets, leveraging a unique approach based on graph neural networks (GNNs).
https://elihei2.github.io/segger_dev/
MIT License
38 stars 3 forks source link

[BUG] TypeError in build_pyg_data_from_tile #10

Closed pakiessling closed 1 month ago

pakiessling commented 1 month ago

Hi I am trying to follow https://elihei2.github.io/segger_dev/user_guide/data_creation/#xenium-data with my own data. However I am facing some issues. Below is the code I am running with comments showing modifcations of the tutorial:


from segger.data import XeniumSample
from pathlib import Path
import scanpy as sc
from segger.data.utils import calculate_gene_celltype_abundance_embedding
# changed from from segger.utils import calculate_gene_celltype_abundance_embedding

raw_data_dir = Path("/hpcwork/p0020567/data/xenium/heart/5000_panel/raw/13_14_mouse/")
processed_data_dir = Path("./output")
sample_tag = "output-XETG00229__0041691__Region_1__20240829__104828"

scRNAseq_path = "/hpcwork/rwth1209/data/scRNA/reference_datasets/kuppe_2022_infarct_global_clean_costalab.h5ad"
scRNAseq = sc.read(scRNAseq_path)
sc.pp.subsample(scRNAseq, fraction=0.1)

celltype_column = "cell_type"
gene_celltype_abundance_embedding = calculate_gene_celltype_abundance_embedding(
    scRNAseq, celltype_column
)

xenium_sample = XeniumSample()

#  AttributeError: 'XeniumSample' object has no attribute 'x_min' if not specified
xenium_sample.x_min = 0
xenium_sample.x_max = 16777216
xenium_sample.y_min = 0
xenium_sample.y_max = 16777216

xenium_sample.load_transcripts(
    base_path=raw_data_dir,
    sample=sample_tag,
    transcripts_filename="transcripts.parquet",
    file_format="parquet",
    # additional_embeddings={"cell_type_abundance": gene_celltype_abundance_embedding},
    # TypeError: SpatialTranscriptomicsSample.load_transcripts() got an unexpected keyword argument 'additional_embeddings'
)

# xenium_sample.set_embedding("cell_type_abundance") 'XeniumSample' object has no attribute 'embeddings_dict'. Did you mean: 'embedding_df'?
xenium_sample.cell_type_abundance = gene_celltype_abundance_embedding

nuclei_path = raw_data_dir / sample_tag / "nucleus_boundaries.parquet"
xenium_sample.boundaries_df = xenium_sample.load_boundaries(
    path=nuclei_path, file_format="parquet"
)

tile_pyg_data = xenium_sample.build_pyg_data_from_tile(
    boundaries_df=xenium_sample.boundaries_df,
    transcripts_df=xenium_sample.transcripts_df,
    r_tx=20,
    k_tx=20,
    workers=1,
)

In the building step I get the error:

Loaded boundaries from '/hpcwork/p0020567/data/xenium/heart/5000_panel/raw/13_14_mouse/output-XETG00229__0041691__Region_1__20240829__104828/nucleus_boundaries.parquet' within bounding box (0, 16777216, 0, 16777216).
Computing boundaries geometries...
No precomputed polygons provided. Computing polygons from boundaries with a scale factor of 1.0.
/rwthfs/rz/cluster/work/rwth1209/projects/merfish_segmentation/segger/segger_dev/src/segger/data/io.py:433: UserWarning: `meta` is not specified, inferred from partial data. Please provide `meta` if the result is unexpected.
  Before: .apply(func)
  After:  .apply(func, meta={'x': 'f8', 'y': 'f8'}) for dataframe result
  or:     .apply(func, meta=('x', 'f8'))            for series result
  polygons_ddf = boundaries_df.groupby(cell_id_column).apply(
Adding centroids to the polygons...
Traceback (most recent call last):
  File "/rwthfs/rz/cluster/work/rwth1209/projects/merfish_segmentation/segger/xenium_test/test.py", line 51, in <module>
    tile_pyg_data = xenium_sample.build_pyg_data_from_tile(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/rwthfs/rz/cluster/work/rwth1209/projects/merfish_segmentation/segger/segger_dev/src/segger/data/io.py", line 868, in build_pyg_data_from_tile
    bd_gdf = self.compute_boundaries_geometries(boundaries_df, scale_factor=scale_boundaries)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/rwthfs/rz/cluster/work/rwth1209/projects/merfish_segmentation/segger/segger_dev/src/segger/data/io.py", line 560, in compute_boundaries_geometries
    if polygons_gdf.shape[0] == 0:
       ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/work/rwth1209/enviroments/segger3/lib/python3.11/site-packages/dask_expr/_collection.py", line 4773, in __bool__
    raise TypeError(
TypeError: Trying to convert <dask_expr.expr.Scalar: expr=(DropDuplicates(frame=Assign(frame=Assign(frame=GroupByApply(frame=Assign(frame=ReadParquetFSSpec(3e95fb4)), observed=False, func=<function SpatialTranscriptomicsSample.generate_and_scale_polygons.<locals>.<lambda> at 0x14ebd427d260>, meta=<no_default>, args=(), kwargs={}))))).size() // 4 == 0, dtype=bool> to a boolean value. Because Dask objects are lazily evaluated, they cannot be converted to a boolean value or used in boolean conditions like if statements. Try calling .compute() to force computation prior to converting to a boolean value or using in a conditional statement.
EliHei2 commented 1 month ago

Hey @pakiessling thanks for reaching out. We made some structural changes, that is possibly confusing regarding the workflow. Please follow this tutorial :https://github.com/EliHei2/segger_dev/blob/main/docs/notebooks/segger_tutorial.ipynb to create data, train, and predict.

pakiessling commented 1 month ago

Ty!

I am now getting TypeError: SpatialTranscriptomicsSample.save_dataset_for_segger() got an unexpected keyword argument 'receptive_field'

in the

 xs.save_dataset_for_segger(
        processed_dir=segger_data_dir,
        r_tx=5,
        k_tx=15,
        receptive_field=receptive_field,
        x_size=120,
        y_size=120,
        d_x=100,
        d_y=100,
        margin_x=10,
        margin_y=10,
        num_workers=4,  # change to you number of CPUs
    )

step.

EliHei2 commented 1 month ago

you could define the receptive field like this receptive_field = {'k_bd': 3, 'dist_bd': 20,'k_tx': 15, 'dist_tx': 3}.

pakiessling commented 1 month ago

yeah I did but the function save_dataset_for_segger() does not seem to take receptive_field as an argument anymore. I omitted it and it saved a dataset

EliHei2 commented 1 month ago

that's true, in the new version we don't compute the receptive field in the data_creation step, rather in the prediciton step.