YeoLab / bento-tools

A Python toolkit for subcellular analysis of spatial transcriptomics data
https://bento-tools.readthedocs.io/en/latest/
BSD 2-Clause "Simplified" License
70 stars 6 forks source link

bt.io.prep(sdata) error: operands could not be broadcast together with shapes (2,) (3,) #161

Open wgmao opened 3 weeks ago

wgmao commented 3 weeks ago

Thank you for developing this wonderful tool! I am working with a Xenium dataset but came across an error when running bt.io.prep following the tutorial. The zarr object was prepared using spatialdata_io(). I have attached the log and the object I am working on has the following summary. I think the problem is caused by the 3D coordinates of the transcripts. Is there an easy way to modify the object and only keep the x&y coordinates? Thank you! bento.log

PS: I have tried sdata_crop.points['transcripts'] = sdata_crop.points['transcripts'].drop('z', axis = 1). transcripts now becomes (2D points) but the error persists.

SpatialData object, with associated Zarr store: /path_to_data.zarr/data.zarr
├── Images
│     └── 'morphology_focus': DataTree[cyx] (1, 54533, 14990), (1, 27266, 7495), (1, 13633, 3747), (1, 6816, 1873), (1, 3408, 936)
├── Labels
│     ├── 'cell_labels': DataTree[yx] (54533, 14990), (27266, 7495), (13633, 3747), (6816, 1873), (3408, 936)
│     └── 'nucleus_labels': DataTree[yx] (54533, 14990), (27266, 7495), (13633, 3747), (6816, 1873), (3408, 936)
├── Points
│     └── 'transcripts': DataFrame with shape: (<Delayed>, 11) (3D points)
├── Shapes
│     ├── 'cell_boundaries': GeoDataFrame shape: (90808, 1) (2D shapes)
│     └── 'nucleus_boundaries': GeoDataFrame shape: (90808, 1) (2D shapes)
└── Tables
      └── 'table': AnnData (90808, 377)
with coordinate systems:
    ▸ 'global', with elements:
        morphology_focus (Images), cell_labels (Labels), nucleus_labels (Labels), transcripts (Points), cell_boundaries (Shapes), nucleus_boundaries (Shapes)
ckmah commented 3 weeks ago

Hi @wgmao, thanks for reporting this it's a little tedious but here is the current solution. You are on the right track, we drop the 3rd dimension on the points like so (you may need to set feature_key/instance_key values to your dataset's):

# Get the scale vector for x and y coordinates from the transcripts points
xyz_scale_vector = sd.transformations.get_transformation(
    sdata.points["transcripts"]
).to_scale_vector(["x", "y"])

# Create a Scale transformation for x and y axes
xy_scale = sd.transformations.Scale(scale=xyz_scale_vector, axes=["x", "y"])

# Get the current transformation attributes
transform = sdata.points["transcripts"].attrs

# Update the global transformation with the new xy_scale
transform["transform"]["global"] = xy_scale

# Parse the transcripts data into a PointsModel
sdata.points["transcripts"] = sd.models.PointsModel.parse(
    sdata.points["transcripts"].compute().reset_index(drop=True),
    coordinates={"x": "x", "y": "y"},
    feature_key="feature_name",
    instance_key="cell_id",
)

# Set the updated transformation attributes back to the transcripts points
sdata.points["transcripts"].attrs = transform
wgmao commented 2 weeks ago

Thank you for your prompt response! I will try it out and see if the problem gets resolved. Meanwhile, I tried to bypass this error following the Data Prep Guide. The goal here is to construct a SpatialData manually.

from spatialdata.models import PointsModel, ShapesModel

shapes = dict(
    cell_boundaries = ShapesModel.parse(sdata_crop.shapes['cell_boundaries']),
    nucleus_boundaries = ShapesModel.parse(sdata_crop.shapes['nucleus_boundaries'])
)

points = dict(
    transcripts = PointsModel.parse(
        sdata_crop.points['transcripts'].compute(), coordinates = {"x": "x", "y":"y"}, feature_key = "feature_name"
    )
)

sdata_2 = sd.SpatialData(
    shapes = shapes,
    points = points,
    tables = sdata_crop.tables
)

Then I followed the main guide with sdata_2 as input. sdata_2 worked well with bt.io.prep(), but there were panels missing when using bt.pl.shape_stats() (see attached). Further when running bt.tl.flux(), there was an error stating that

File "python3.10/site-packages/bento/_utils.py", line 93, in get_points
    raise ValueError(f"Points key {points_key} not found in sdata.points")
ValueError: Points key cell_boundaries_raster not found in sdata.points

I looked into the _flux.py a little and noticed that cells, gene_names and cell_composition were all empty objects around line 148 (inside the flux function). My takeaway is that there are some formatting issues embedded somewhere.

output