ckmah / bento-tools

A Python toolkit for subcellular analysis of spatial transcriptomics data
https://bento-tools.readthedocs.io/en/latest/
BSD 2-Clause "Simplified" License
68 stars 6 forks source link

bt.io.prep(sdata) error: Index ( ['index'], dtype='object') #159

Open dertrotl opened 4 days ago

dertrotl commented 4 days ago

Hey,

first of all thank you very much for your great package! However, when trying it out, I already got an error message when trying to execute the bt.io.prep function (see screenshot)

image

Hope you can help me with my issue!

Some session infos:

GeoPandas 1.0.1 Spatial Data 0.2.3 Python 3.10 bento-tools 2.1.3

sdata #(loaded with `spatialdata_io xenium` function)

SpatialData object
├── Images
│     ├── 'morphology_focus': DataTree[cyx] (1, 20503, 22785), (1, 10251, 11392), (1, 5125, 5696), (1, 2562, 2848), (1, 1281, 1424)
│     └── 'morphology_mip': DataTree[cyx] (1, 20503, 22785), (1, 10251, 11392), (1, 5125, 5696), (1, 2562, 2848), (1, 1281, 1424)
├── Labels
│     ├── 'cell_labels': DataTree[yx] (20503, 22785), (10251, 11392), (5125, 5696), (2562, 2848), (1281, 1424)
│     └── 'nucleus_labels': DataTree[yx] (20503, 22785), (10251, 11392), (5125, 5696), (2562, 2848), (1281, 1424)
├── Points
│     └── 'transcripts': DataFrame with shape: (<Delayed>, 10) (3D points)
├── Shapes
│     ├── 'cell_boundaries': GeoDataFrame shape: (30368, 1) (2D shapes)
│     ├── 'cell_circles': GeoDataFrame shape: (30368, 2) (2D shapes)
│     └── 'nucleus_boundaries': GeoDataFrame shape: (30368, 1) (2D shapes)
└── Tables
      └── 'table': AnnData (30368, 372)
with coordinate systems:
    ▸ 'global', with elements:
        morphology_focus (Images), morphology_mip (Images), cell_labels (Labels), nucleus_labels (Labels), transcripts (Points), cell_boundaries (Shapes), cell_circles (Shapes), nucleus_boundaries (Shapes)
ckmah commented 4 days ago

Hi @dertrotl , thanks for reporting the issue. Here is something to try for now, let me know if it works. I will try to address this in the next update:

# remove index name from cell GeoDataFrame
del sdata['cell_boundaries'].index.name

sdata = bt.io.prep(sdata)
dertrotl commented 23 hours ago

Hey @ckmah,

thank you very much for your reply! I tested your suggestion, which unfortunately didn't work in my case.

del sdata['cell_boundaries'].index.name

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[9], line 1
----> 1 del sdata['cell_boundaries'].index.name

AttributeError: can't delete attribute 'name'

Also tried to remove the index names like this:

sdata['cell_boundaries'].index.name = None, which caused the following error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File [../micromamba/envs/bento/lib/python3.10/site-packages/pandas/core/indexes/base.py:3805](../micromamba/envs/bento/lib/python3.10/site-packages/pandas/core/indexes/base.py#line=3804), in Index.get_loc(self, key)
   3804 try:
-> 3805     return self._engine.get_loc(casted_key)
   3806 except KeyError as err:

File index.pyx:167, in pandas._libs.index.IndexEngine.get_loc()

File index.pyx:196, in pandas._libs.index.IndexEngine.get_loc()

File pandas[/_libs/hashtable_class_helper.pxi:7081](http://localhost:1235/_libs/hashtable_class_helper.pxi#line=7080), in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas[/_libs/hashtable_class_helper.pxi:7089](http://localhost:1235/_libs/hashtable_class_helper.pxi#line=7088), in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'index_right'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
Cell In[11], line 1
----> 1 sdata = bt.io.prep(sdata)

File [../micromamba/envs/bento/lib/python3.10/site-packages/bento/io/_io.py:87](../micromamba/envs/bento/lib/python3.10/site-packages/bento/io/_io.py#line=86), in prep(sdata, points_key, feature_key, instance_key, shape_keys)
     85 if len(shape_sjoin) > 0:
     86     pbar.set_description("Mapping shapes")
---> 87     sdata = _sjoin_shapes(
     88         sdata=sdata, instance_key=instance_key, shape_keys=shape_sjoin
     89     )
     91 pbar.update()
     93 # Only keep points within instance_key shape

File [../micromamba/envs/bento/lib/python3.10/site-packages/bento/io/_index.py:111](../micromamba/envs/bento/lib/python3.10/site-packages/bento/io/_index.py#line=110), in _sjoin_shapes(sdata, instance_key, shape_keys)
    107 child_shape = gpd.GeoDataFrame(geometry=child_shape.buffer(-10e-6))
    109 # Map child shape index to parent shape and process the result
    110 parent_shape = (
--> 111     parent_shape.sjoin(child_shape, how="left", predicate="covers")
    112     .reset_index()
    113     .drop_duplicates(subset="index", keep="last")
    114     .set_index("index")
    115     .assign(
    116         index_right=lambda df: df.loc[
    117             ~df["index_right"].duplicated(keep="first"), "index_right"
    118         ]
    119         .fillna("")
    120         .astype("category")
    121     )
    122     .rename(columns={"index_right": shape_key})
    123 )
    124 parent_shape[shape_key] = parent_shape[shape_key].fillna("")
    126 # Save shape index as column in instance_key shape

File [../micromamba/envs/bento/lib/python3.10/site-packages/pandas/core/frame.py:5239](../micromamba/envs/bento/lib/python3.10/site-packages/pandas/core/frame.py#line=5238), in DataFrame.assign(self, **kwargs)
   5236 data = self.copy(deep=None)
   5238 for k, v in kwargs.items():
-> 5239     data[k] = com.apply_if_callable(v, data)
   5240 return data

File [../micromamba/envs/bento/lib/python3.10/site-packages/pandas/core/common.py:384](../micromamba/envs/bento/lib/python3.10/site-packages/pandas/core/common.py#line=383), in apply_if_callable(maybe_callable, obj, **kwargs)
    373 """
    374 Evaluate possibly callable input using obj and kwargs if it is callable,
    375 otherwise return as it is.
   (...)
    381 **kwargs
    382 """
    383 if callable(maybe_callable):
--> 384     return maybe_callable(obj, **kwargs)
    386 return maybe_callable

File [../micromamba/envs/bento/lib/python3.10/site-packages/bento/io/_index.py:117](../micromamba/envs/bento/lib/python3.10/site-packages/bento/io/_index.py#line=116), in _sjoin_shapes.<locals>.<lambda>(df)
    107 child_shape = gpd.GeoDataFrame(geometry=child_shape.buffer(-10e-6))
    109 # Map child shape index to parent shape and process the result
    110 parent_shape = (
    111     parent_shape.sjoin(child_shape, how="left", predicate="covers")
    112     .reset_index()
    113     .drop_duplicates(subset="index", keep="last")
    114     .set_index("index")
    115     .assign(
    116         index_right=lambda df: df.loc[
--> 117             ~df["index_right"].duplicated(keep="first"), "index_right"
    118         ]
    119         .fillna("")
    120         .astype("category")
    121     )
    122     .rename(columns={"index_right": shape_key})
    123 )
    124 parent_shape[shape_key] = parent_shape[shape_key].fillna("")
    126 # Save shape index as column in instance_key shape

File [../micromamba/envs/bento/lib/python3.10/site-packages/geopandas/geodataframe.py:1750](../micromamba/envs/bento/lib/python3.10/site-packages/geopandas/geodataframe.py#line=1749), in GeoDataFrame.__getitem__(self, key)
   1744 def __getitem__(self, key):
   1745     """
   1746     If the result is a column containing only 'geometry', return a
   1747     GeoSeries. If it's a DataFrame with any columns of GeometryDtype,
   1748     return a GeoDataFrame.
   1749     """
-> 1750     result = super().__getitem__(key)
   1751     # Custom logic to avoid waiting for pandas GH51895
   1752     # result is not geometry dtype for multi-indexes
   1753     if (
   1754         pd.api.types.is_scalar(key)
   1755         and key == ""
   (...)
   1758         and not is_geometry_type(result)
   1759     ):

File [../micromamba/envs/bento/lib/python3.10/site-packages/pandas/core/frame.py:4102](../micromamba/envs/bento/lib/python3.10/site-packages/pandas/core/frame.py#line=4101), in DataFrame.__getitem__(self, key)
   4100 if self.columns.nlevels > 1:
   4101     return self._getitem_multilevel(key)
-> 4102 indexer = self.columns.get_loc(key)
   4103 if is_integer(indexer):
   4104     indexer = [indexer]

File [../micromamba/envs/bento/lib/python3.10/site-packages/pandas/core/indexes/base.py:3812](../micromamba/envs/bento/lib/python3.10/site-packages/pandas/core/indexes/base.py#line=3811), in Index.get_loc(self, key)
   3807     if isinstance(casted_key, slice) or (
   3808         isinstance(casted_key, abc.Iterable)
   3809         and any(isinstance(x, slice) for x in casted_key)
   3810     ):
   3811         raise InvalidIndexError(key)
-> 3812     raise KeyError(key) from err
   3813 except TypeError:
   3814     # If we have a listlike key, _check_indexing_error will raise
   3815     #  InvalidIndexError. Otherwise we fall through and re-raise
   3816     #  the TypeError.
   3817     self._check_indexing_error(key)

KeyError: 'index_right'