ckmah / bento-tools

A Python toolkit for subcellular analysis of spatial transcriptomics data
https://bento-tools.readthedocs.io/en/latest/
BSD 2-Clause "Simplified" License
66 stars 6 forks source link

Question #155

Open SGIlabes opened 22 hours ago

SGIlabes commented 22 hours ago

Thank you so much for creating such a great tool! Bento is exactly what I’ve been needing, and I’m really excited to get started with it. I’ve been trying to use it with my own Xenium data, but I keep running into a bit of an issue. Here’s what I’m doing: sdata = spatialdata_io.xenium('/path/Xenium/analysis/J01AD1[0001(2)D]', n_jobs=7, cells_as_circles=True) sdata = bt.io.prep(sdata) sdata SpatialData object ├── Images │ ├── 'morphology_focus': DataTree[cyx] (1, 17133, 25621), (1, 8566, 12810), (1, 4283, 6405), (1, 2141, 3202), (1, 1070, 1601) │ └── 'morphology_mip': DataTree[cyx] (1, 17133, 25621), (1, 8566, 12810), (1, 4283, 6405), (1, 2141, 3202), (1, 1070, 1601) ├── Labels │ ├── 'cell_labels': DataTree[yx] (17133, 25621), (8566, 12810), (4283, 6405), (2141, 3202), (1070, 1601) │ └── 'nucleus_labels': DataTree[yx] (17133, 25621), (8566, 12810), (4283, 6405), (2141, 3202), (1070, 1601) ├── Points │ └── 'transcripts': DataFrame with shape: (, 10) (3D points) ├── Shapes │ ├── 'cell_boundaries': GeoDataFrame shape: (31067, 2) (2D shapes) │ ├── 'cell_circles': GeoDataFrame shape: (31067, 2) (2D shapes) │ └── 'nucleus_boundaries': GeoDataFrame shape: (31067, 1) (2D shapes) └── Tables └── 'table': AnnData (31067, 300) with coordinate systems: ▸ 'global', with elements: morphology_focus (Images), morphology_mip (Images), cell_labels (Labels), nucleus_labels (Labels), transcripts (Points), cell_boundaries (Shapes), cell_circles (Shapes), nucleus_boundaries (Shapes)


LossySetitemError Traceback (most recent call last) File /opt/homebrew/lib/python3.11/site-packages/pandas/core/indexing.py:2133, in _iLocIndexer._setitem_single_column(self, loc, value, plane_indexer) 2132 try: -> 2133 self.obj._mgr.column_setitem( 2134 loc, plane_indexer, value, inplace_only=True 2135 ) 2136 except (ValueError, TypeError, LossySetitemError): 2137 # If we're setting an entire column and we can't do it inplace, 2138 # then we can use value's dtype (or inferred dtype) 2139 # instead of object

File /opt/homebrew/lib/python3.11/site-packages/pandas/core/internals/managers.py:1335, in BlockManager.column_setitem(self, loc, idx, value, inplace_only) 1334 if inplace_only: -> 1335 col_mgr.setitem_inplace(idx, value) 1336 else:

File /opt/homebrew/lib/python3.11/site-packages/pandas/core/internals/managers.py:2044, in SingleBlockManager.setitem_inplace(self, indexer, value, warn) 2038 warnings.warn( 2039 COW_WARNING_SETITEM_MSG, 2040 FutureWarning, 2041 stacklevel=find_stack_level(), 2042 ) -> 2044 super().setitem_inplace(indexer, value)

File /opt/homebrew/lib/python3.11/site-packages/pandas/core/internals/base.py:357, in SingleDataManager.setitem_inplace(self, indexer, value, warn) 354 if isinstance(arr, np.ndarray): 355 # Note: checking for ndarray instead of np.dtype means we exclude 356 # dt64/td64, which do their own validation. --> 357 value = np_can_hold_element(arr.dtype, value) 359 if isinstance(value, np.ndarray) and value.ndim == 1 and len(value) == 1: 360 # NumPy 1.25 deprecation: https://github.com/numpy/numpy/pull/10615

File /opt/homebrew/lib/python3.11/site-packages/pandas/core/dtypes/cast.py:1939, in np_can_hold_element(dtype, element) 1937 if dtype.kind == "V": 1938 # i.e. np.void, which cannot hold anything -> 1939 raise LossySetitemError 1941 raise NotImplementedError(dtype)

LossySetitemError:

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last) Cell In[102], line 1 ----> 1 sdata = bt.io.prep(sdata)

File /opt/homebrew/lib/python3.11/site-packages/bento/io/_io.py:77, in prep(sdata, points_key, feature_key, instance_key, shape_keys) 75 if len(point_sjoin) > 0: 76 pbar.set_description("Mapping points") ---> 77 sdata = _sjoin_points( 78 sdata=sdata, 79 points_key=points_key, 80 shape_keys=point_sjoin, 81 ) 83 pbar.update() 85 if len(shape_sjoin) > 0:

File /opt/homebrew/lib/python3.11/site-packages/bento/io/_index.py:64, in _sjoin_points(sdata, points_key, shape_keys) 55 indexed_points[shape_key] = ( 56 points.sjoin(shape, how="left", predicate="intersects") 57 .reset_index() (...) 60 .values.flatten() 61 ) 63 index_points = pd.DataFrame(indexed_points) ---> 64 set_points_metadata( 65 sdata, points_key, index_points, columns=list(indexed_points.keys()) 66 ) 68 return sdata

File /opt/homebrew/lib/python3.11/site-packages/bento/_utils.py:256, in set_points_metadata(sdata, points_key, metadata, columns) 254 transform = sdata.points[points_key].attrs 255 points = sdata.points[points_key].compute() --> 256 points.loc[:, columns] = metadata 257 points = PointsModel.parse( 258 dd.from_pandas(points, npartitions=1), coordinates={"x": "x", "y": "y"} 259 ) 260 points.attrs = transform

File /opt/homebrew/lib/python3.11/site-packages/pandas/core/indexing.py:911, in _LocationIndexer.setitem(self, key, value) 908 self._has_valid_setitem_indexer(key) 910 iloc = self if self.name == "iloc" else self.obj.iloc --> 911 iloc._setitem_with_indexer(indexer, value, self.name)

File /opt/homebrew/lib/python3.11/site-packages/pandas/core/indexing.py:1942, in _iLocIndexer._setitem_with_indexer(self, indexer, value, name) 1939 # align and set the values 1940 if take_split_path: 1941 # We have to operate column-wise -> 1942 self._setitem_with_indexer_split_path(indexer, value, name) 1943 else: 1944 self._setitem_single_block(indexer, value, name)

File /opt/homebrew/lib/python3.11/site-packages/pandas/core/indexing.py:1982, in _iLocIndexer._setitem_with_indexer_split_path(self, indexer, value, name) 1977 self._setitem_with_indexer_frame_value(indexer, value, name) 1979 elif np.ndim(value) == 2: 1980 # TODO: avoid np.ndim call in case it isn't an ndarray, since 1981 # that will construct an ndarray, which will be wasteful -> 1982 self._setitem_with_indexer_2d_value(indexer, value) 1984 elif len(ilocs) == 1 and lplane_indexer == len(value) and not is_scalar(pi): 1985 # We are setting multiple rows in a single column. 1986 self._setitem_single_column(ilocs[0], value, pi)

File /opt/homebrew/lib/python3.11/site-packages/pandas/core/indexing.py:2057, in _iLocIndexer._setitem_with_indexer_2d_value(self, indexer, value) 2054 if is_object_dtype(value_col.dtype): 2055 # casting to list so that we do type inference in setitem_single_column 2056 value_col = value_col.tolist() -> 2057 self._setitem_single_column(loc, value_col, pi)

File /opt/homebrew/lib/python3.11/site-packages/pandas/core/indexing.py:2160, in _iLocIndexer._setitem_single_column(self, loc, value, plane_indexer) 2141 if dtype not in (np.void, object) and not self.obj.empty: 2142 # - Exclude np.void, as that is a special case for expansion. 2143 # We want to warn for (...) 2150 # - Exclude empty initial object with enlargement, 2151 # as then there's nothing to be inconsistent with. 2152 warnings.warn( 2153 f"Setting an item of incompatible dtype is deprecated " 2154 "and will raise in a future error of pandas. " (...) 2158 stacklevel=find_stack_level(), 2159 ) -> 2160 self.obj.isetitem(loc, value) 2161 else: 2162 # set value into the column (first attempting to operate inplace, then 2163 # falling back to casting if necessary) 2164 dtype = self.obj.dtypes.iloc[loc]

File /opt/homebrew/lib/python3.11/site-packages/pandas/core/frame.py:4268, in DataFrame.isetitem(self, loc, value) 4265 self._iset_item_mgr(idx, arraylike, inplace=False, refs=refs) 4266 return -> 4268 arraylike, refs = self._sanitize_column(value) 4269 self._iset_item_mgr(loc, arraylike, inplace=False, refs=refs)

File /opt/homebrew/lib/python3.11/site-packages/pandas/core/frame.py:5266, in DataFrame._sanitize_column(self, value) 5263 return _reindex_for_setitem(value, self.index) 5265 if is_list_like(value): -> 5266 com.require_length_match(value, self.index) 5267 arr = sanitize_array(value, self.index, copy=True, allow_2d=True) 5268 if ( 5269 isinstance(value, Index) 5270 and value.dtype == "object" (...) 5273 # TODO: Remove kludge in sanitize_array for string mode when enforcing 5274 # this deprecation

File /opt/homebrew/lib/python3.11/site-packages/pandas/core/common.py:573, in require_length_match(data, index) 569 """ 570 Check the length of data matches the length of the index. 571 """ 572 if len(data) != len(index): --> 573 raise ValueError( 574 "Length of values " 575 f"({len(data)}) " 576 "does not match length of index " 577 f"({len(index)})" 578 )

ValueError: Length of values (4000000) does not match length of index (10611340)

I’ve been scratching my head trying to figure it out, but I just can’t seem to find the cause. Any idea what might be going wrong here?

Also, I had one more quick question: Is it possible to use Bento for multi-sample and group-based analysis? For example, I’m interested in checking whether the location of a specific gene varies between different groups. Is that something Bento can handle?

Thanks again for all your hard work! Looking forward to hearing from you.

SGIlabes commented 22 hours ago

re: 2.2.1 ipaddress: 1.0 ipykernel._version: 6.29.5 json: 2.0.9 jupyter_client._version: 8.6.2 platform: 1.0.8 zmq.sugar.version: 26.2.0 zmq.sugar: 26.2.0 zmq: 26.2.0 logging: 0.5.1.2 traitlets._version: 5.14.3 traitlets: 5.14.3 jupyter_core.version: 5.7.2 jupyter_core: 5.7.2 zlib: 1.0 _curses: b'2.2' socketserver: 0.4 argparse: 1.1 dateutil: 2.8.2 six: 1.16.0 _decimal: 1.70 decimal: 1.70 platformdirs.version: 4.3.2 platformdirs: 4.3.2 _csv: 1.0 csv: 1.0 jupyter_client: 8.6.2 ipykernel: 6.29.5 IPython.core.release: 8.27.0 executing.version: 2.1.0 executing: 2.1.0 pure_eval.version: 0.2.3 pure_eval: 0.2.3 stack_data.version: 0.6.3 stack_data: 0.6.3 pygments: 2.18.0 decorator: 5.1.1 wcwidth: 0.2.13 prompt_toolkit: 3.0.47 parso: 0.8.4 jedi: 0.19.1 urllib.request: 3.11 IPython: 8.27.0 comm: 0.2.2 psutil: 6.0.0 packaging: 23.2 _ctypes: 1.1.0 ctypes: 1.1.0 debugpy.public_api: 1.8.5 debugpy: 1.8.5 xmlrpc.client: 3.11 http.server: 0.6 _pydevd_frame_eval.vendored.bytecode: 0.13.0.dev _pydev_bundle.fsnotify: 0.1.5 pydevd: 2.9.5 ctypes.macholib: 1.0 appnope: 0.1.4 numpy.version: 1.26.2 numpy.core._multiarray_umath: 3.1 numpy.core: 1.26.2 numpy.linalg._umath_linalg: 0.1.5 numpy: 1.26.2 pytz: 2023.3.post1 pyarrow._generated_version: 17.0.0 numpy._core._multiarray_umath: 3.1 cloudpickle: 3.0.0 pyarrow: 17.0.0 pandas._version_meson: 2.2.2 pandas: 2.2.2 shapely: 2.0.6 certifi: 2024.08.30 pyproj: 3.6.1 geopandas: 1.0.1 yaml: 6.0.2 toolz: 0.12.1 tlz: 0.12.1 markupsafe: 2.1.3 jinja2: 3.1.2 tblib: 3.0.0 dask: 2024.9.1 scipy: 1.10.1 llvmlite: 0.43.0 numba.cloudpickle: 3.0.0 numba.misc.appdirs: 1.4.1 numba: 0.60.0 scipy.sparse.linalg._isolve._iterative: 1.23.2 scipy._lib.decorator: 4.0.5 scipy.linalg._fblas: 1.23.2 scipy.linalg._flapack: 1.23.2 scipy.linalg._flinalg: 1.23.2 scipy.sparse.linalg._eigen.arpack._arpack: 1.23.2 sparse._version: 0.15.4 sparse: 0.15.4 scipy._lib._uarray: 0.8.8.dev0+aa94c5a4.scipy scipy.special._specfun: 1.23.2 fsspec: 2023.6.0 anndata._version: 0.10.9 h5py: 3.11.0 natsort: 8.4.0 numcodecs.version: 0.13.0 numcodecs.blosc: 1.21.6.dev numcodecs.zstd: 1.5.5 numcodecs.lz4: 1.9.4 msgpack: 1.1.0 numcodecs: 0.13.0 zarr.version: 2.18.3 zarr: 2.18.3 torch.version: 2.1.2 torch.torch_version: 2.1.2 tqdm._dist_ver: 4.66.5 tqdm.version: 4.66.5 tqdm.cli: 4.66.5 tqdm: 4.66.5 mpmath: 1.3.0 sympy.release: 1.12 sympy.multipledispatch: 0.4.9 sympy: 1.12 torch: 2.1.2 anndata: 0.10.9 xarray: 2024.9.0 datatree._version: 0.0.14 datatree: 0.0.14 attr: 24.2.0 networkx: 3.2.1 lazy_loader: 0.4 pooch._version: 1.8.2 pooch: v1.8.2 skimage.data._fetchers: 0.24.0 skimage: 0.24.0 multiscale_spatial_image.about: 1.0.1 xarray_dataclasses: 1.8.0 spatial_image: 1.1.0 multiscale_spatial_image: 1.0.1 pkg_resources._vendor.more_itertools: 10.2.0 pkg_resources.extern.more_itertools: 10.2.0 pkg_resources._vendor.packaging: 24.0 pkg_resources.extern.packaging: 24.0 pkg_resources._vendor.platformdirs.version: 2.6.2 pkg_resources._vendor.platformdirs: 2.6.2 pkg_resources.extern.platformdirs: 2.6.2 xarray_schema: 0.0.3 numba.types.np: 1.26.2 numba.types.logging: 0.5.1.2 param._version: 2.1.1 param: 2.1.1 multipledispatch: 0.6.0 PIL._version: 10.1.0 PIL: 10.1.0 defusedxml: 0.7.1 cffi: 1.17.1 PIL.Image: 10.1.0 dask_expr: 1.1.15 pyct: 0.5.0 setuptools._distutils: 3.11.9 distutils._vendor.packaging: 24.0 setuptools.version: 70.3.0 setuptools._vendor.packaging: 24.0 setuptools.extern.packaging: 24.0 setuptools._vendor.more_itertools: 8.8.0 setuptools.extern.more_itertools: 8.8.0 setuptools._vendor.ordered_set: 3.1 setuptools.extern.ordered_set: 3.1 setuptools: 70.3.0 distutils: 3.11.9 urllib3.packages.six: 1.16.0 urllib3._version: 1.26.20 urllib3.util.ssl_match_hostname: 3.5.0.1 urllib3.connection: 1.26.20 urllib3: 1.26.20 chardet.version: 5.2.0 chardet: 5.2.0 simplejson: 3.19.3 charset_normalizer.version: 3.3.2 charset_normalizer: 3.3.2 requests.packages.urllib3.packages.six: 1.16.0 requests.packages.urllib3._version: 1.26.20 requests.packages.urllib3.util.ssl_match_hostname: 3.5.0.1 requests.packages.urllib3.connection: 1.26.20 requests.packages.urllib3: 1.26.20 idna.package_data: 3.8 idna.idnadata: 15.1.0 idna: 3.8 requests.packages.idna.package_data: 3.8 requests.packages.idna.idnadata: 15.1.0 requests.packages.idna: 3.8 requests.packages.chardet.version: 5.2.0 requests.packages.chardet: 5.2.0 requests.version: 2.32.3 requests.utils: 2.32.3 requests: 2.32.3 pyct.cmd: 0.5.0 datashader: 0.16.3 xrspatial._version: 0.4.0 xrspatial: 0.4.0 scipy.interpolate.dfitpack: 1.23.2 scipy.optimize._minpack2: 1.23.2 scipy.optimize._lbfgsb: 1.23.2 scipy.optimize._cobyla: 1.23.2 scipy.optimize._slsqp: 1.23.2 scipy.optimize.__nnls: 1.23.2 scipy.linalg._interpolative: 1.23.2 matplotlib._version: 3.8.2 pyparsing: 3.1.1 cycler: 0.12.1 kiwisolver._cext: 1.4.5 kiwisolver: 1.4.5 matplotlib: 3.8.2 spatialdata: 0.2.3 emoji: 2.13.2 ipywidgets._version: 8.1.5 ipywidgets: 8.1.5 seaborn.external.husl: 2.1.0 seaborn.external.appdirs: 1.4.4 scipy.integrate._vode: 1.23.2 scipy.integrate._dop: 1.23.2 scipy.integrate._lsoda: 1.23.2 scipy.stats._statlib: 1.23.2 scipy.stats._mvn: 1.23.2 patsy.version: 0.5.6 patsy: 0.5.6 statsmodels._version: 0.14.2 statsmodels: 0.14.2 seaborn: 0.13.2 adjustText._version: 1.2.0 adjustText: 1.2.0 joblib.externals.cloudpickle: 2.2.0 joblib.externals.loky: 3.4.1 joblib: 1.3.2 sklearn.utils._joblib: 1.3.2 threadpoolctl: 3.2.0 sklearn.utils._estimator_html_repr: 1.5.2 sklearn.base: 1.5.2 sklearn.utils._show_versions: 1.5.2 sklearn: 1.5.2 matplotlib_scalebar: 0.8.1 upsetplot: 0.9.0 kneed._version: 0.8.5 kneed: 0.8.5 tensorly: 0.8.1 click: 8.1.7 affine: 2.4.0 rasterio: 1.4.0 _cffi_backend: 1.17.1 pycparser.ply: 3.9 pycparser.ply.yacc: 3.10 pycparser.ply.lex: 3.10 pycparser: 2.22 decoupler: 1.8.0 astropy._version: 6.1.4 astropy.extern.configobj.validate: 1.0.1 astropy: 6.1.4 erfa._version: 2.0.1.4 erfa: 2.0.1.4 astropy.extern.ply: 3.11 astropy.extern.ply.yacc: 3.11 astropy.extern.ply.lex: 3.11 fcsparser.version: 0.2.8 fcsparser: 0.2.8 readfcs: 1.1.8 slicerator: 1.1.0 imagecodecs.imagecodecs: 2024.9.22 imagecodecs: 2024.9.22 tifffile.tifffile: 2024.9.20 tifffile: 2024.9.20 imageio: 2.35.1 pims: 0.7 scanpy._version: 1.10.2 scanpy: 1.10.2 spatialdata_io: 0.1.5 zict: 3.0.0 sortedcontainers: 2.4.0 distributed: 2024.9.1 matplotlib_inline: 0.1.7 ptyprocess: 0.7.0 pexpect: 4.9.0

ckmah commented 16 hours ago

Hi @SGIlabes thanks for reporting this issue. I suspect this is related to #153. Can you provide a snippet of the points data by calling sdata['transcripts'].head() and what version of the package you are using? Thanks!

What version of the package are you using?