Support for pandas >=2.0.0, spatialdata>=0.1.0 #137

Closed easlinger closed 2 months ago

easlinger commented 2 months ago

Awesome package -- Thanks for all your work on it! Are there any plans to support pandas 2? This would make it possible to integrate into our other workflows more fully. For instance, recent versions of squidpy require pandas 2.1.0.

ckmah commented 2 months ago

Hi @easlinger, thanks for bringing this up! Yes, I want to support pandas>=2 and test compatibility with pandas API changes. I'll prioritize this in the next release.

easlinger commented 2 months ago

Thanks so much! I appreciate it.

Supporting the newest spatialdata and spatialdata-io versions would also help -- I'm having some issues prepping multi-modally-segmented Xenium data

Thanks for all your work!

ckmah commented 2 months ago

For sure! The spatialdata update to >=0.1.0 also had some breaking API changes i.e. moving from single to multi-table support. If you could reproduce any errors you run into with your Xenium data that would great, I will reference it when refactoring. Glad you brought all of these up 👍

easlinger commented 2 months ago

This is an error I get in with old version data (Xenium 1.0/single-channel segmentation). I can't share the data, unfortunately, but I'll try to give as much context as possible to reproduce (environment listed in last section).

In a future comment, I'll share what happens with our multi-modal segmentation data.

Summary/Additional Context

Calls to Error (in bento/io/ >

_sjoin_points() (in bento/io/ >

set_points_metadata() (in bento/io/ >

points.loc[:, columns] = metadata (line 256; in set_points_metadata()) >

ValueError: Must have equal len keys and value when setting with an iterable

I did a lazy man's debugging with a statement printing some of the objects involved in the error.

columns = ['cell_boundaries']

Here's a print-out of part of the metadata variable (which appear to be cell IDs): ['eofgbmjc-1' 'enmongjp-1' 'enmolhlp-1' ... '' '' '']

And here are the first two rows of the points dataframe:

                   x             y          z feature_name     cell_id  \
0          37.685432  10092.416016  11.131549      IRF2BP2  UNASSIGNED   
1         147.899933  10138.629883  11.069950      SLC26A6  UNASSIGNED   

           transcript_id fov_name  nucleus_distance         qv  \
0        282170761412612       Q3        392.351013  40.000000   
1        282170761412639       Q3        283.523773  40.000000   

0                       0  
1                       0  

It looks to me like the code is trying to store cell IDs in the transcripts dataframe, which leads to incompatible dimensions?


%load_ext autoreload
%autoreload 2

import os
import matplotlib.pyplot
import seaborn as sns
import scanpy as sc
import bento as bt
import spatialdata_io as sdio
import spatialdata as sd
import pandas as pd
import numpy as np

sdata = sdio.xenium(directory_path)
kwargs = dict(points_key="transcripts", feature_key="feature_name",
              shape_keys=["cell_boundaries", "nucleus_boundaries"])
sdata_p =, **kwargs)  # for Bento compatibility

Error Traceback

ValueError                                Traceback (most recent call last)
Cell In[5], [line 4](vscode-notebook-cell:?execution_count=5&line=4)
      [1](vscode-notebook-cell:?execution_count=5&line=1) kwargs = dict(points_key="transcripts", feature_key="feature_name",
      [2](vscode-notebook-cell:?execution_count=5&line=2)               instance_key="cell_boundaries",
      [3](vscode-notebook-cell:?execution_count=5&line=3)               shape_keys=["cell_boundaries", "nucleus_boundaries"])
----> [4](vscode-notebook-cell:?execution_count=5&line=4) sdata =, **kwargs)  # for Bento compatibility
      [5](vscode-notebook-cell:?execution_count=5&line=5) sdata

File ~/bento-tools/bento/io/, in prep(sdata, points_key, feature_key, instance_key, shape_keys)
     [73]( if len(point_sjoin) > 0:
     [74](     pbar.set_description("Mapping points")
---> [75](     sdata = _sjoin_points(
     [76](         sdata=sdata,
     [77](         points_key=points_key,
     [78](         shape_keys=point_sjoin,
     [79](     )
     [81]( pbar.update()
     [83]( if len(shape_sjoin) > 0:

File ~/bento-tools/bento/io/, in _sjoin_points(sdata, points_key, shape_keys)
     [54](     points["index_right"].fillna("", inplace=True)
     [55](     points.rename(columns={"index_right": shape_key}, inplace=True)
---> [57](     set_points_metadata(sdata, points_key, points[shape_key], columns=shape_key)
     [59]( return sdata

File ~/bento-tools/bento/, in set_points_metadata(sdata, points_key, metadata, columns)
    [249]( transform = sdata.points[points_key].attrs
    [250]( points = sdata.points[points_key].compute()
--> [251]( points.loc[:, columns] = metadata
    [252]( points = PointsModel.parse(
    [253](     dd.from_pandas(points, npartitions=1), coordinates={"x": "x", "y": "y"}
    [254]( )
    [255]( points.attrs = transform

File ~/miniconda3/envs/bento/lib/python3.10/site-packages/pandas/core/, in _LocationIndexer.__setitem__(self, key, value)
    [815]( self._has_valid_setitem_indexer(key)
    [817]( iloc = self if == "iloc" else self.obj.iloc
--> [818]( iloc._setitem_with_indexer(indexer, value,

File ~/miniconda3/envs/bento/lib/python3.10/site-packages/pandas/core/, in _iLocIndexer._setitem_with_indexer(self, indexer, value, name)
   [1792]( # align and set the values
   [1793]( if take_split_path:
   [1794](     # We have to operate column-wise
-> [1795](     self._setitem_with_indexer_split_path(indexer, value, name)
   [1796]( else:
   [1797](     self._setitem_single_block(indexer, value, name)

File ~/miniconda3/envs/bento/lib/python3.10/site-packages/pandas/core/, in _iLocIndexer._setitem_with_indexer_split_path(self, indexer, value, name)
   [1845](     if len(value) == 1 and not is_integer(info_axis):
   [1846](         # This is a case like df.iloc[:3, [1]] = [0]
   [1847](         #  where we treat as df.iloc[:3, 1] = 0
   [1848](         return self._setitem_with_indexer((pi, info_axis[0]), value[0])
-> [1850](     raise ValueError(
   [1851](         "Must have equal len keys and value "
   [1852](         "when setting with an iterable"
   [1853](     )
   [1855]( elif lplane_indexer == 0 and len(value) == len(self.obj.index):
   [1856](     # We get here in one case via .loc with a all-False mask
   [1857](     pass

ValueError: Must have equal len keys and value when setting with an iterable


easlinger commented 2 months ago

I've finalized the info in the comment above -- Later I'll try to send a similar report using multi-channel segmentation data.

ckmah commented 2 months ago

I've opened a separate issue specifically for spatialdata-io since the bugs are due to data format changes instead of API changes @easlinger . If you have more relevant info, please add it there, thanks!