YeoLab / bento-tools

A Python toolkit for subcellular analysis of spatial transcriptomics data
https://bento-tools.readthedocs.io/en/latest/
BSD 2-Clause "Simplified" License
71 stars 6 forks source link

bt.io.prep(sdata) error: Index ( ['index'], dtype='object') #159

Open dertrotl opened 1 month ago

dertrotl commented 1 month ago

Hey,

first of all thank you very much for your great package! However, when trying it out, I already got an error message when trying to execute the bt.io.prep function (see screenshot)

image

Hope you can help me with my issue!

Some session infos:

GeoPandas 1.0.1 Spatial Data 0.2.3 Python 3.10 bento-tools 2.1.3

sdata #(loaded with `spatialdata_io xenium` function)

SpatialData object
├── Images
│     ├── 'morphology_focus': DataTree[cyx] (1, 20503, 22785), (1, 10251, 11392), (1, 5125, 5696), (1, 2562, 2848), (1, 1281, 1424)
│     └── 'morphology_mip': DataTree[cyx] (1, 20503, 22785), (1, 10251, 11392), (1, 5125, 5696), (1, 2562, 2848), (1, 1281, 1424)
├── Labels
│     ├── 'cell_labels': DataTree[yx] (20503, 22785), (10251, 11392), (5125, 5696), (2562, 2848), (1281, 1424)
│     └── 'nucleus_labels': DataTree[yx] (20503, 22785), (10251, 11392), (5125, 5696), (2562, 2848), (1281, 1424)
├── Points
│     └── 'transcripts': DataFrame with shape: (<Delayed>, 10) (3D points)
├── Shapes
│     ├── 'cell_boundaries': GeoDataFrame shape: (30368, 1) (2D shapes)
│     ├── 'cell_circles': GeoDataFrame shape: (30368, 2) (2D shapes)
│     └── 'nucleus_boundaries': GeoDataFrame shape: (30368, 1) (2D shapes)
└── Tables
      └── 'table': AnnData (30368, 372)
with coordinate systems:
    ▸ 'global', with elements:
        morphology_focus (Images), morphology_mip (Images), cell_labels (Labels), nucleus_labels (Labels), transcripts (Points), cell_boundaries (Shapes), cell_circles (Shapes), nucleus_boundaries (Shapes)
ckmah commented 1 month ago

Hi @dertrotl , thanks for reporting the issue. Here is something to try for now, let me know if it works. I will try to address this in the next update:

# remove index name from cell GeoDataFrame
del sdata['cell_boundaries'].index.name

sdata = bt.io.prep(sdata)
dertrotl commented 1 month ago

Hey @ckmah,

thank you very much for your reply! I tested your suggestion, which unfortunately didn't work in my case.

del sdata['cell_boundaries'].index.name

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[9], line 1
----> 1 del sdata['cell_boundaries'].index.name

AttributeError: can't delete attribute 'name'

Also tried to remove the index names like this:

sdata['cell_boundaries'].index.name = None, which caused the following error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File [../micromamba/envs/bento/lib/python3.10/site-packages/pandas/core/indexes/base.py:3805](../micromamba/envs/bento/lib/python3.10/site-packages/pandas/core/indexes/base.py#line=3804), in Index.get_loc(self, key)
   3804 try:
-> 3805     return self._engine.get_loc(casted_key)
   3806 except KeyError as err:

File index.pyx:167, in pandas._libs.index.IndexEngine.get_loc()

File index.pyx:196, in pandas._libs.index.IndexEngine.get_loc()

File pandas[/_libs/hashtable_class_helper.pxi:7081](http://localhost:1235/_libs/hashtable_class_helper.pxi#line=7080), in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas[/_libs/hashtable_class_helper.pxi:7089](http://localhost:1235/_libs/hashtable_class_helper.pxi#line=7088), in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'index_right'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
Cell In[11], line 1
----> 1 sdata = bt.io.prep(sdata)

File [../micromamba/envs/bento/lib/python3.10/site-packages/bento/io/_io.py:87](../micromamba/envs/bento/lib/python3.10/site-packages/bento/io/_io.py#line=86), in prep(sdata, points_key, feature_key, instance_key, shape_keys)
     85 if len(shape_sjoin) > 0:
     86     pbar.set_description("Mapping shapes")
---> 87     sdata = _sjoin_shapes(
     88         sdata=sdata, instance_key=instance_key, shape_keys=shape_sjoin
     89     )
     91 pbar.update()
     93 # Only keep points within instance_key shape

File [../micromamba/envs/bento/lib/python3.10/site-packages/bento/io/_index.py:111](../micromamba/envs/bento/lib/python3.10/site-packages/bento/io/_index.py#line=110), in _sjoin_shapes(sdata, instance_key, shape_keys)
    107 child_shape = gpd.GeoDataFrame(geometry=child_shape.buffer(-10e-6))
    109 # Map child shape index to parent shape and process the result
    110 parent_shape = (
--> 111     parent_shape.sjoin(child_shape, how="left", predicate="covers")
    112     .reset_index()
    113     .drop_duplicates(subset="index", keep="last")
    114     .set_index("index")
    115     .assign(
    116         index_right=lambda df: df.loc[
    117             ~df["index_right"].duplicated(keep="first"), "index_right"
    118         ]
    119         .fillna("")
    120         .astype("category")
    121     )
    122     .rename(columns={"index_right": shape_key})
    123 )
    124 parent_shape[shape_key] = parent_shape[shape_key].fillna("")
    126 # Save shape index as column in instance_key shape

File [../micromamba/envs/bento/lib/python3.10/site-packages/pandas/core/frame.py:5239](../micromamba/envs/bento/lib/python3.10/site-packages/pandas/core/frame.py#line=5238), in DataFrame.assign(self, **kwargs)
   5236 data = self.copy(deep=None)
   5238 for k, v in kwargs.items():
-> 5239     data[k] = com.apply_if_callable(v, data)
   5240 return data

File [../micromamba/envs/bento/lib/python3.10/site-packages/pandas/core/common.py:384](../micromamba/envs/bento/lib/python3.10/site-packages/pandas/core/common.py#line=383), in apply_if_callable(maybe_callable, obj, **kwargs)
    373 """
    374 Evaluate possibly callable input using obj and kwargs if it is callable,
    375 otherwise return as it is.
   (...)
    381 **kwargs
    382 """
    383 if callable(maybe_callable):
--> 384     return maybe_callable(obj, **kwargs)
    386 return maybe_callable

File [../micromamba/envs/bento/lib/python3.10/site-packages/bento/io/_index.py:117](../micromamba/envs/bento/lib/python3.10/site-packages/bento/io/_index.py#line=116), in _sjoin_shapes.<locals>.<lambda>(df)
    107 child_shape = gpd.GeoDataFrame(geometry=child_shape.buffer(-10e-6))
    109 # Map child shape index to parent shape and process the result
    110 parent_shape = (
    111     parent_shape.sjoin(child_shape, how="left", predicate="covers")
    112     .reset_index()
    113     .drop_duplicates(subset="index", keep="last")
    114     .set_index("index")
    115     .assign(
    116         index_right=lambda df: df.loc[
--> 117             ~df["index_right"].duplicated(keep="first"), "index_right"
    118         ]
    119         .fillna("")
    120         .astype("category")
    121     )
    122     .rename(columns={"index_right": shape_key})
    123 )
    124 parent_shape[shape_key] = parent_shape[shape_key].fillna("")
    126 # Save shape index as column in instance_key shape

File [../micromamba/envs/bento/lib/python3.10/site-packages/geopandas/geodataframe.py:1750](../micromamba/envs/bento/lib/python3.10/site-packages/geopandas/geodataframe.py#line=1749), in GeoDataFrame.__getitem__(self, key)
   1744 def __getitem__(self, key):
   1745     """
   1746     If the result is a column containing only 'geometry', return a
   1747     GeoSeries. If it's a DataFrame with any columns of GeometryDtype,
   1748     return a GeoDataFrame.
   1749     """
-> 1750     result = super().__getitem__(key)
   1751     # Custom logic to avoid waiting for pandas GH51895
   1752     # result is not geometry dtype for multi-indexes
   1753     if (
   1754         pd.api.types.is_scalar(key)
   1755         and key == ""
   (...)
   1758         and not is_geometry_type(result)
   1759     ):

File [../micromamba/envs/bento/lib/python3.10/site-packages/pandas/core/frame.py:4102](../micromamba/envs/bento/lib/python3.10/site-packages/pandas/core/frame.py#line=4101), in DataFrame.__getitem__(self, key)
   4100 if self.columns.nlevels > 1:
   4101     return self._getitem_multilevel(key)
-> 4102 indexer = self.columns.get_loc(key)
   4103 if is_integer(indexer):
   4104     indexer = [indexer]

File [../micromamba/envs/bento/lib/python3.10/site-packages/pandas/core/indexes/base.py:3812](../micromamba/envs/bento/lib/python3.10/site-packages/pandas/core/indexes/base.py#line=3811), in Index.get_loc(self, key)
   3807     if isinstance(casted_key, slice) or (
   3808         isinstance(casted_key, abc.Iterable)
   3809         and any(isinstance(x, slice) for x in casted_key)
   3810     ):
   3811         raise InvalidIndexError(key)
-> 3812     raise KeyError(key) from err
   3813 except TypeError:
   3814     # If we have a listlike key, _check_indexing_error will raise
   3815     #  InvalidIndexError. Otherwise we fall through and re-raise
   3816     #  the TypeError.
   3817     self._check_indexing_error(key)

KeyError: 'index_right'
nklkhlr commented 1 month ago

Hi @dertrotl and @ckmah,

I've run into the same issue and the fix for me was to ensure that the index name of all data frames in shapes is None as otherwise the columns added by sjoin are the index names instead of 'index' and 'index_right'.

In your particular case, @dertrotl , could it be that either 'nucleus_boundaries' or 'cell_circles' have named indices too?

Happy to draft up a pull-request that does index name checks within _sjoin_shapes if that helps!

dertrotl commented 1 month ago

Hi @nklkhlr,

thank you for your reply. Can confirm, that your solution fixed the issue. Thanks a lot!