Open SGIlabes opened 22 hours ago
re: 2.2.1 ipaddress: 1.0 ipykernel._version: 6.29.5 json: 2.0.9 jupyter_client._version: 8.6.2 platform: 1.0.8 zmq.sugar.version: 26.2.0 zmq.sugar: 26.2.0 zmq: 26.2.0 logging: 0.5.1.2 traitlets._version: 5.14.3 traitlets: 5.14.3 jupyter_core.version: 5.7.2 jupyter_core: 5.7.2 zlib: 1.0 _curses: b'2.2' socketserver: 0.4 argparse: 1.1 dateutil: 2.8.2 six: 1.16.0 _decimal: 1.70 decimal: 1.70 platformdirs.version: 4.3.2 platformdirs: 4.3.2 _csv: 1.0 csv: 1.0 jupyter_client: 8.6.2 ipykernel: 6.29.5 IPython.core.release: 8.27.0 executing.version: 2.1.0 executing: 2.1.0 pure_eval.version: 0.2.3 pure_eval: 0.2.3 stack_data.version: 0.6.3 stack_data: 0.6.3 pygments: 2.18.0 decorator: 5.1.1 wcwidth: 0.2.13 prompt_toolkit: 3.0.47 parso: 0.8.4 jedi: 0.19.1 urllib.request: 3.11 IPython: 8.27.0 comm: 0.2.2 psutil: 6.0.0 packaging: 23.2 _ctypes: 1.1.0 ctypes: 1.1.0 debugpy.public_api: 1.8.5 debugpy: 1.8.5 xmlrpc.client: 3.11 http.server: 0.6 _pydevd_frame_eval.vendored.bytecode: 0.13.0.dev _pydev_bundle.fsnotify: 0.1.5 pydevd: 2.9.5 ctypes.macholib: 1.0 appnope: 0.1.4 numpy.version: 1.26.2 numpy.core._multiarray_umath: 3.1 numpy.core: 1.26.2 numpy.linalg._umath_linalg: 0.1.5 numpy: 1.26.2 pytz: 2023.3.post1 pyarrow._generated_version: 17.0.0 numpy._core._multiarray_umath: 3.1 cloudpickle: 3.0.0 pyarrow: 17.0.0 pandas._version_meson: 2.2.2 pandas: 2.2.2 shapely: 2.0.6 certifi: 2024.08.30 pyproj: 3.6.1 geopandas: 1.0.1 yaml: 6.0.2 toolz: 0.12.1 tlz: 0.12.1 markupsafe: 2.1.3 jinja2: 3.1.2 tblib: 3.0.0 dask: 2024.9.1 scipy: 1.10.1 llvmlite: 0.43.0 numba.cloudpickle: 3.0.0 numba.misc.appdirs: 1.4.1 numba: 0.60.0 scipy.sparse.linalg._isolve._iterative: 1.23.2 scipy._lib.decorator: 4.0.5 scipy.linalg._fblas: 1.23.2 scipy.linalg._flapack: 1.23.2 scipy.linalg._flinalg: 1.23.2 scipy.sparse.linalg._eigen.arpack._arpack: 1.23.2 sparse._version: 0.15.4 sparse: 0.15.4 scipy._lib._uarray: 0.8.8.dev0+aa94c5a4.scipy scipy.special._specfun: 1.23.2 fsspec: 2023.6.0 anndata._version: 0.10.9 h5py: 3.11.0 natsort: 8.4.0 numcodecs.version: 0.13.0 numcodecs.blosc: 1.21.6.dev numcodecs.zstd: 1.5.5 numcodecs.lz4: 1.9.4 msgpack: 1.1.0 numcodecs: 0.13.0 zarr.version: 2.18.3 zarr: 2.18.3 torch.version: 2.1.2 torch.torch_version: 2.1.2 tqdm._dist_ver: 4.66.5 tqdm.version: 4.66.5 tqdm.cli: 4.66.5 tqdm: 4.66.5 mpmath: 1.3.0 sympy.release: 1.12 sympy.multipledispatch: 0.4.9 sympy: 1.12 torch: 2.1.2 anndata: 0.10.9 xarray: 2024.9.0 datatree._version: 0.0.14 datatree: 0.0.14 attr: 24.2.0 networkx: 3.2.1 lazy_loader: 0.4 pooch._version: 1.8.2 pooch: v1.8.2 skimage.data._fetchers: 0.24.0 skimage: 0.24.0 multiscale_spatial_image.about: 1.0.1 xarray_dataclasses: 1.8.0 spatial_image: 1.1.0 multiscale_spatial_image: 1.0.1 pkg_resources._vendor.more_itertools: 10.2.0 pkg_resources.extern.more_itertools: 10.2.0 pkg_resources._vendor.packaging: 24.0 pkg_resources.extern.packaging: 24.0 pkg_resources._vendor.platformdirs.version: 2.6.2 pkg_resources._vendor.platformdirs: 2.6.2 pkg_resources.extern.platformdirs: 2.6.2 xarray_schema: 0.0.3 numba.types.np: 1.26.2 numba.types.logging: 0.5.1.2 param._version: 2.1.1 param: 2.1.1 multipledispatch: 0.6.0 PIL._version: 10.1.0 PIL: 10.1.0 defusedxml: 0.7.1 cffi: 1.17.1 PIL.Image: 10.1.0 dask_expr: 1.1.15 pyct: 0.5.0 setuptools._distutils: 3.11.9 distutils._vendor.packaging: 24.0 setuptools.version: 70.3.0 setuptools._vendor.packaging: 24.0 setuptools.extern.packaging: 24.0 setuptools._vendor.more_itertools: 8.8.0 setuptools.extern.more_itertools: 8.8.0 setuptools._vendor.ordered_set: 3.1 setuptools.extern.ordered_set: 3.1 setuptools: 70.3.0 distutils: 3.11.9 urllib3.packages.six: 1.16.0 urllib3._version: 1.26.20 urllib3.util.ssl_match_hostname: 3.5.0.1 urllib3.connection: 1.26.20 urllib3: 1.26.20 chardet.version: 5.2.0 chardet: 5.2.0 simplejson: 3.19.3 charset_normalizer.version: 3.3.2 charset_normalizer: 3.3.2 requests.packages.urllib3.packages.six: 1.16.0 requests.packages.urllib3._version: 1.26.20 requests.packages.urllib3.util.ssl_match_hostname: 3.5.0.1 requests.packages.urllib3.connection: 1.26.20 requests.packages.urllib3: 1.26.20 idna.package_data: 3.8 idna.idnadata: 15.1.0 idna: 3.8 requests.packages.idna.package_data: 3.8 requests.packages.idna.idnadata: 15.1.0 requests.packages.idna: 3.8 requests.packages.chardet.version: 5.2.0 requests.packages.chardet: 5.2.0 requests.version: 2.32.3 requests.utils: 2.32.3 requests: 2.32.3 pyct.cmd: 0.5.0 datashader: 0.16.3 xrspatial._version: 0.4.0 xrspatial: 0.4.0 scipy.interpolate.dfitpack: 1.23.2 scipy.optimize._minpack2: 1.23.2 scipy.optimize._lbfgsb: 1.23.2 scipy.optimize._cobyla: 1.23.2 scipy.optimize._slsqp: 1.23.2 scipy.optimize.__nnls: 1.23.2 scipy.linalg._interpolative: 1.23.2 matplotlib._version: 3.8.2 pyparsing: 3.1.1 cycler: 0.12.1 kiwisolver._cext: 1.4.5 kiwisolver: 1.4.5 matplotlib: 3.8.2 spatialdata: 0.2.3 emoji: 2.13.2 ipywidgets._version: 8.1.5 ipywidgets: 8.1.5 seaborn.external.husl: 2.1.0 seaborn.external.appdirs: 1.4.4 scipy.integrate._vode: 1.23.2 scipy.integrate._dop: 1.23.2 scipy.integrate._lsoda: 1.23.2 scipy.stats._statlib: 1.23.2 scipy.stats._mvn: 1.23.2 patsy.version: 0.5.6 patsy: 0.5.6 statsmodels._version: 0.14.2 statsmodels: 0.14.2 seaborn: 0.13.2 adjustText._version: 1.2.0 adjustText: 1.2.0 joblib.externals.cloudpickle: 2.2.0 joblib.externals.loky: 3.4.1 joblib: 1.3.2 sklearn.utils._joblib: 1.3.2 threadpoolctl: 3.2.0 sklearn.utils._estimator_html_repr: 1.5.2 sklearn.base: 1.5.2 sklearn.utils._show_versions: 1.5.2 sklearn: 1.5.2 matplotlib_scalebar: 0.8.1 upsetplot: 0.9.0 kneed._version: 0.8.5 kneed: 0.8.5 tensorly: 0.8.1 click: 8.1.7 affine: 2.4.0 rasterio: 1.4.0 _cffi_backend: 1.17.1 pycparser.ply: 3.9 pycparser.ply.yacc: 3.10 pycparser.ply.lex: 3.10 pycparser: 2.22 decoupler: 1.8.0 astropy._version: 6.1.4 astropy.extern.configobj.validate: 1.0.1 astropy: 6.1.4 erfa._version: 2.0.1.4 erfa: 2.0.1.4 astropy.extern.ply: 3.11 astropy.extern.ply.yacc: 3.11 astropy.extern.ply.lex: 3.11 fcsparser.version: 0.2.8 fcsparser: 0.2.8 readfcs: 1.1.8 slicerator: 1.1.0 imagecodecs.imagecodecs: 2024.9.22 imagecodecs: 2024.9.22 tifffile.tifffile: 2024.9.20 tifffile: 2024.9.20 imageio: 2.35.1 pims: 0.7 scanpy._version: 1.10.2 scanpy: 1.10.2 spatialdata_io: 0.1.5 zict: 3.0.0 sortedcontainers: 2.4.0 distributed: 2024.9.1 matplotlib_inline: 0.1.7 ptyprocess: 0.7.0 pexpect: 4.9.0
Hi @SGIlabes thanks for reporting this issue. I suspect this is related to #153. Can you provide a snippet of the points data by calling sdata['transcripts'].head()
and what version of the package you are using? Thanks!
What version of the package are you using?
Thank you so much for creating such a great tool! Bento is exactly what I’ve been needing, and I’m really excited to get started with it. I’ve been trying to use it with my own Xenium data, but I keep running into a bit of an issue. Here’s what I’m doing: sdata = spatialdata_io.xenium('/path/Xenium/analysis/J01AD1[0001(2)D]', n_jobs=7, cells_as_circles=True) sdata = bt.io.prep(sdata) sdata SpatialData object ├── Images │ ├── 'morphology_focus': DataTree[cyx] (1, 17133, 25621), (1, 8566, 12810), (1, 4283, 6405), (1, 2141, 3202), (1, 1070, 1601) │ └── 'morphology_mip': DataTree[cyx] (1, 17133, 25621), (1, 8566, 12810), (1, 4283, 6405), (1, 2141, 3202), (1, 1070, 1601) ├── Labels │ ├── 'cell_labels': DataTree[yx] (17133, 25621), (8566, 12810), (4283, 6405), (2141, 3202), (1070, 1601) │ └── 'nucleus_labels': DataTree[yx] (17133, 25621), (8566, 12810), (4283, 6405), (2141, 3202), (1070, 1601) ├── Points │ └── 'transcripts': DataFrame with shape: (, 10) (3D points)
├── Shapes
│ ├── 'cell_boundaries': GeoDataFrame shape: (31067, 2) (2D shapes)
│ ├── 'cell_circles': GeoDataFrame shape: (31067, 2) (2D shapes)
│ └── 'nucleus_boundaries': GeoDataFrame shape: (31067, 1) (2D shapes)
└── Tables
└── 'table': AnnData (31067, 300)
with coordinate systems:
▸ 'global', with elements:
morphology_focus (Images), morphology_mip (Images), cell_labels (Labels), nucleus_labels (Labels), transcripts (Points), cell_boundaries (Shapes), cell_circles (Shapes), nucleus_boundaries (Shapes)
LossySetitemError Traceback (most recent call last) File /opt/homebrew/lib/python3.11/site-packages/pandas/core/indexing.py:2133, in _iLocIndexer._setitem_single_column(self, loc, value, plane_indexer) 2132 try: -> 2133 self.obj._mgr.column_setitem( 2134 loc, plane_indexer, value, inplace_only=True 2135 ) 2136 except (ValueError, TypeError, LossySetitemError): 2137 # If we're setting an entire column and we can't do it inplace, 2138 # then we can use value's dtype (or inferred dtype) 2139 # instead of object
File /opt/homebrew/lib/python3.11/site-packages/pandas/core/internals/managers.py:1335, in BlockManager.column_setitem(self, loc, idx, value, inplace_only) 1334 if inplace_only: -> 1335 col_mgr.setitem_inplace(idx, value) 1336 else:
File /opt/homebrew/lib/python3.11/site-packages/pandas/core/internals/managers.py:2044, in SingleBlockManager.setitem_inplace(self, indexer, value, warn) 2038 warnings.warn( 2039 COW_WARNING_SETITEM_MSG, 2040 FutureWarning, 2041 stacklevel=find_stack_level(), 2042 ) -> 2044 super().setitem_inplace(indexer, value)
File /opt/homebrew/lib/python3.11/site-packages/pandas/core/internals/base.py:357, in SingleDataManager.setitem_inplace(self, indexer, value, warn) 354 if isinstance(arr, np.ndarray): 355 # Note: checking for ndarray instead of np.dtype means we exclude 356 # dt64/td64, which do their own validation. --> 357 value = np_can_hold_element(arr.dtype, value) 359 if isinstance(value, np.ndarray) and value.ndim == 1 and len(value) == 1: 360 # NumPy 1.25 deprecation: https://github.com/numpy/numpy/pull/10615
File /opt/homebrew/lib/python3.11/site-packages/pandas/core/dtypes/cast.py:1939, in np_can_hold_element(dtype, element) 1937 if dtype.kind == "V": 1938 # i.e. np.void, which cannot hold anything -> 1939 raise LossySetitemError 1941 raise NotImplementedError(dtype)
LossySetitemError:
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last) Cell In[102], line 1 ----> 1 sdata = bt.io.prep(sdata)
File /opt/homebrew/lib/python3.11/site-packages/bento/io/_io.py:77, in prep(sdata, points_key, feature_key, instance_key, shape_keys) 75 if len(point_sjoin) > 0: 76 pbar.set_description("Mapping points") ---> 77 sdata = _sjoin_points( 78 sdata=sdata, 79 points_key=points_key, 80 shape_keys=point_sjoin, 81 ) 83 pbar.update() 85 if len(shape_sjoin) > 0:
File /opt/homebrew/lib/python3.11/site-packages/bento/io/_index.py:64, in _sjoin_points(sdata, points_key, shape_keys) 55 indexed_points[shape_key] = ( 56 points.sjoin(shape, how="left", predicate="intersects") 57 .reset_index() (...) 60 .values.flatten() 61 ) 63 index_points = pd.DataFrame(indexed_points) ---> 64 set_points_metadata( 65 sdata, points_key, index_points, columns=list(indexed_points.keys()) 66 ) 68 return sdata
File /opt/homebrew/lib/python3.11/site-packages/bento/_utils.py:256, in set_points_metadata(sdata, points_key, metadata, columns) 254 transform = sdata.points[points_key].attrs 255 points = sdata.points[points_key].compute() --> 256 points.loc[:, columns] = metadata 257 points = PointsModel.parse( 258 dd.from_pandas(points, npartitions=1), coordinates={"x": "x", "y": "y"} 259 ) 260 points.attrs = transform
File /opt/homebrew/lib/python3.11/site-packages/pandas/core/indexing.py:911, in _LocationIndexer.setitem(self, key, value) 908 self._has_valid_setitem_indexer(key) 910 iloc = self if self.name == "iloc" else self.obj.iloc --> 911 iloc._setitem_with_indexer(indexer, value, self.name)
File /opt/homebrew/lib/python3.11/site-packages/pandas/core/indexing.py:1942, in _iLocIndexer._setitem_with_indexer(self, indexer, value, name) 1939 # align and set the values 1940 if take_split_path: 1941 # We have to operate column-wise -> 1942 self._setitem_with_indexer_split_path(indexer, value, name) 1943 else: 1944 self._setitem_single_block(indexer, value, name)
File /opt/homebrew/lib/python3.11/site-packages/pandas/core/indexing.py:1982, in _iLocIndexer._setitem_with_indexer_split_path(self, indexer, value, name) 1977 self._setitem_with_indexer_frame_value(indexer, value, name) 1979 elif np.ndim(value) == 2: 1980 # TODO: avoid np.ndim call in case it isn't an ndarray, since 1981 # that will construct an ndarray, which will be wasteful -> 1982 self._setitem_with_indexer_2d_value(indexer, value) 1984 elif len(ilocs) == 1 and lplane_indexer == len(value) and not is_scalar(pi): 1985 # We are setting multiple rows in a single column. 1986 self._setitem_single_column(ilocs[0], value, pi)
File /opt/homebrew/lib/python3.11/site-packages/pandas/core/indexing.py:2057, in _iLocIndexer._setitem_with_indexer_2d_value(self, indexer, value) 2054 if is_object_dtype(value_col.dtype): 2055 # casting to list so that we do type inference in setitem_single_column 2056 value_col = value_col.tolist() -> 2057 self._setitem_single_column(loc, value_col, pi)
File /opt/homebrew/lib/python3.11/site-packages/pandas/core/indexing.py:2160, in _iLocIndexer._setitem_single_column(self, loc, value, plane_indexer) 2141 if dtype not in (np.void, object) and not self.obj.empty: 2142 # - Exclude np.void, as that is a special case for expansion. 2143 # We want to warn for (...) 2150 # - Exclude empty initial object with enlargement, 2151 # as then there's nothing to be inconsistent with. 2152 warnings.warn( 2153 f"Setting an item of incompatible dtype is deprecated " 2154 "and will raise in a future error of pandas. " (...) 2158 stacklevel=find_stack_level(), 2159 ) -> 2160 self.obj.isetitem(loc, value) 2161 else: 2162 # set value into the column (first attempting to operate inplace, then 2163 # falling back to casting if necessary) 2164 dtype = self.obj.dtypes.iloc[loc]
File /opt/homebrew/lib/python3.11/site-packages/pandas/core/frame.py:4268, in DataFrame.isetitem(self, loc, value) 4265 self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs) 4266 return -> 4268 arraylike, refs = self._sanitize_column(value) 4269 self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)
File /opt/homebrew/lib/python3.11/site-packages/pandas/core/frame.py:5266, in DataFrame._sanitize_column(self, value) 5263 return _reindex_for_setitem(value, self.index) 5265 if is_list_like(value): -> 5266 com.require_length_match(value, self.index) 5267 arr = sanitize_array(value, self.index, copy=True, allow_2d=True) 5268 if ( 5269 isinstance(value, Index) 5270 and value.dtype == "object" (...) 5273 # TODO: Remove kludge in sanitize_array for string mode when enforcing 5274 # this deprecation
File /opt/homebrew/lib/python3.11/site-packages/pandas/core/common.py:573, in require_length_match(data, index) 569 """ 570 Check the length of data matches the length of the index. 571 """ 572 if len(data) != len(index): --> 573 raise ValueError( 574 "Length of values " 575 f"({len(data)}) " 576 "does not match length of index " 577 f"({len(index)})" 578 )
ValueError: Length of values (4000000) does not match length of index (10611340)
I’ve been scratching my head trying to figure it out, but I just can’t seem to find the cause. Any idea what might be going wrong here?
Also, I had one more quick question: Is it possible to use Bento for multi-sample and group-based analysis? For example, I’m interested in checking whether the location of a specific gene varies between different groups. Is that something Bento can handle?
Thanks again for all your hard work! Looking forward to hearing from you.