STOmics / Stereopy

A toolkit of spatial transcriptomic analysis.
MIT License
187 stars 64 forks source link

KeyError in single_r #150

Closed fumi-github closed 1 year ago

fumi-github commented 1 year ago

Thank you for the awesome program! I encountered an error when running the SingleR tutorial with my data. A test data with filtered genes are available here. ref_small.h5ad data_small.h5ad I would appreciate any help or suggestion. Fumi

My code:

import stereo as st
from stereo.core.stereo_exp_data import AnnBasedStereoExpData
import warnings
warnings.filterwarnings('ignore')

ref = AnnBasedStereoExpData('./ref_small.h5ad')
data = st.io.read_stereo_h5ad(file_path='./data_small.h5ad')

# preprocessing
ref.tl.log1p()
ref.tl.normalize_total()
data.tl.log1p()
data.tl.normalize_total()

# do it!
data.tl.single_r(
                ref_exp_data=ref,
                ref_use_col='celltype',
                res_key='annotation'
                )

Error message:

[2023-07-31 16:07:32][Stereo][518184][MainThread][139855852484416][st_pipeline][71][INFO]: register algorithm single_r to <stereo.core.st_pipeline.StPipeline object at 0x7f3257192ee0>
[2023-07-31 16:08:11][Stereo][518184][MainThread][139855852484416][single_r][106][INFO]: start single-r with n_jobs=1 fine_tune_times=0
100%|██████████| 8/8 [00:00<00:00, 65.81it/s]
[2023-07-31 16:08:11][Stereo][518184][MainThread][139855852484416][single_r][142][INFO]: scoring test_data finished, cost 0.15860295295715332 seconds
[2023-07-31 16:08:11][Stereo][518184][MainThread][139855852484416][single_r][231][INFO]: fine-tuning with test_data(shape=(214, 15078))
0it [00:00, ?it/s]INFO:numba.core.transforms:finding looplift candidates

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[7], line 2
      1 # do it!
----> 2 data.tl.single_r(
      3                 ref_exp_data=ref,
      4                 ref_use_col='celltype',
      5                 res_key='annotation'
      6                 )

File ~/mambaforge/envs/stereopy2/lib/python3.8/site-packages/stereo/algorithm/single_r/single_r.py:146, in SingleR.main(self, ref_exp_data, ref_use_col, cluster_res_key, quantile, fine_tune_threshold, fine_tune_times, n_jobs, res_key)
    144 logger.debug('start fine-tuning...')
    145 start_time = time.time()
--> 146 ret_labels = self._fine_tune(test_data, output, trained_data)
    147 logger.debug(f'fine-tuning finished, cost {time.time() - start_time} seconds')
    149 res = pd.DataFrame(columns=['bins', 'group', 'first_labels'])

File ~/mambaforge/envs/stereopy2/lib/python3.8/site-packages/stereo/algorithm/single_r/single_r.py:233, in SingleR._fine_tune(self, test_data, output, trained_data)
    225 ref = pd.DataFrame(
    226     self.ref_exp_data.exp_matrix.toarray(),
    227     index=self.ref_exp_data.cell_names,
    228     columns=self.ref_exp_data.gene_names
    229 )
    231 logger.info(f'fine-tuning with test_data(shape={test_data.exp_matrix.shape})')
--> 233 ret_labels = Parallel(n_jobs=self.n_jobs, backend="threading")(
    234     delayed(self._fine_tune_parallel)(
    235         ref,
    236         output.columns[tmp[x].astype(bool)].values,
    237         y[1].to_frame().T,
    238         trained_data
    239     )
    240     for x, y in tqdm(enumerate(test_data.to_df().iterrows()))
    241 )
    242 return ret_labels

File ~/mambaforge/envs/stereopy2/lib/python3.8/site-packages/joblib/parallel.py:1085, in Parallel.__call__(self, iterable)
   1076 try:
   1077     # Only set self._iterating to True if at least a batch
   1078     # was dispatched. In particular this covers the edge
   (...)
   1082     # was very quick and its callback already dispatched all the
   1083     # remaining jobs.
   1084     self._iterating = False
-> 1085     if self.dispatch_one_batch(iterator):
   1086         self._iterating = self._original_iterator is not None
   1088     while self.dispatch_one_batch(iterator):

File ~/mambaforge/envs/stereopy2/lib/python3.8/site-packages/joblib/parallel.py:901, in Parallel.dispatch_one_batch(self, iterator)
    899     return False
    900 else:
--> 901     self._dispatch(tasks)
    902     return True

File ~/mambaforge/envs/stereopy2/lib/python3.8/site-packages/joblib/parallel.py:819, in Parallel._dispatch(self, batch)
    817 with self._lock:
    818     job_idx = len(self._jobs)
--> 819     job = self._backend.apply_async(batch, callback=cb)
    820     # A job can complete so quickly than its callback is
    821     # called before we get here, causing self._jobs to
    822     # grow. To ensure correct results ordering, .insert is
    823     # used (rather than .append) in the following line
    824     self._jobs.insert(job_idx, job)

File ~/mambaforge/envs/stereopy2/lib/python3.8/site-packages/joblib/_parallel_backends.py:208, in SequentialBackend.apply_async(self, func, callback)
    206 def apply_async(self, func, callback=None):
    207     """Schedule a func to be run"""
--> 208     result = ImmediateResult(func)
    209     if callback:
    210         callback(result)

File ~/mambaforge/envs/stereopy2/lib/python3.8/site-packages/joblib/_parallel_backends.py:597, in ImmediateResult.__init__(self, batch)
    594 def __init__(self, batch):
    595     # Don't delay the application, to avoid keeping the input
    596     # arguments in memory
--> 597     self.results = batch()

File ~/mambaforge/envs/stereopy2/lib/python3.8/site-packages/joblib/parallel.py:288, in BatchedCalls.__call__(self)
    284 def __call__(self):
    285     # Set the default nested backend to self._backend but do not set the
    286     # change the default number of processes to -1
    287     with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 288         return [func(*args, **kwargs)
    289                 for func, args, kwargs in self.items]

File ~/mambaforge/envs/stereopy2/lib/python3.8/site-packages/joblib/parallel.py:288, in <listcomp>(.0)
    284 def __call__(self):
    285     # Set the default nested backend to self._backend but do not set the
    286     # change the default number of processes to -1
    287     with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 288         return [func(*args, **kwargs)
    289                 for func, args, kwargs in self.items]

File ~/mambaforge/envs/stereopy2/lib/python3.8/site-packages/stereo/algorithm/single_r/single_r.py:254, in SingleR._fine_tune_parallel(self, ref, labels, y, trained_data)
    252 else:
    253     while len(labels) > 1:
--> 254         labels = self._fine_tune_one_time(labels, ref, y, trained_data)
    255 return labels[0]

File ~/mambaforge/envs/stereopy2/lib/python3.8/site-packages/pandas/core/indexing.py:1067, in _LocationIndexer.__getitem__(self, key)
   1065     if self._is_scalar_access(key):
   1066         return self.obj._get_value(*key, takeable=self._takeable)
-> 1067     return self._getitem_tuple(key)
   1068 else:
   1069     # we by definition only have the 0th axis
   1070     axis = self.axis or 0

File ~/mambaforge/envs/stereopy2/lib/python3.8/site-packages/pandas/core/indexing.py:1256, in _LocIndexer._getitem_tuple(self, tup)
   1253 if self._multi_take_opportunity(tup):
   1254     return self._multi_take(tup)
-> 1256 return self._getitem_tuple_same_dim(tup)

File ~/mambaforge/envs/stereopy2/lib/python3.8/site-packages/pandas/core/indexing.py:924, in _LocationIndexer._getitem_tuple_same_dim(self, tup)
    921 if com.is_null_slice(key):
    922     continue
--> 924 retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
    925 # We should never have retval.ndim < self.ndim, as that should
    926 #  be handled by the _getitem_lowerdim call above.
    927 assert retval.ndim == self.ndim

File ~/mambaforge/envs/stereopy2/lib/python3.8/site-packages/pandas/core/indexing.py:1301, in _LocIndexer._getitem_axis(self, key, axis)
   1298     if hasattr(key, "ndim") and key.ndim > 1:
   1299         raise ValueError("Cannot index with multidimensional key")
-> 1301     return self._getitem_iterable(key, axis=axis)
   1303 # nested tuple slicing
   1304 if is_nested_tuple(key, labels):

File ~/mambaforge/envs/stereopy2/lib/python3.8/site-packages/pandas/core/indexing.py:1239, in _LocIndexer._getitem_iterable(self, key, axis)
   1236 self._validate_key(key, axis)
   1238 # A collection of keys
-> 1239 keyarr, indexer = self._get_listlike_indexer(key, axis)
   1240 return self.obj._reindex_with_indexers(
   1241     {axis: [keyarr, indexer]}, copy=True, allow_dups=True
   1242 )

File ~/mambaforge/envs/stereopy2/lib/python3.8/site-packages/pandas/core/indexing.py:1432, in _LocIndexer._get_listlike_indexer(self, key, axis)
   1429 ax = self.obj._get_axis(axis)
   1430 axis_name = self.obj._get_axis_name(axis)
-> 1432 keyarr, indexer = ax._get_indexer_strict(key, axis_name)
   1434 return keyarr, indexer

File ~/mambaforge/envs/stereopy2/lib/python3.8/site-packages/pandas/core/indexes/base.py:6070, in Index._get_indexer_strict(self, key, axis_name)
   6067 else:
   6068     keyarr, indexer, new_indexer = self._reindex_non_unique(keyarr)
-> 6070 self._raise_if_missing(keyarr, indexer, axis_name)
   6072 keyarr = self.take(indexer)
   6073 if isinstance(key, Index):
   6074     # GH 42790 - Preserve name from an Index

File ~/mambaforge/envs/stereopy2/lib/python3.8/site-packages/pandas/core/indexes/base.py:6133, in Index._raise_if_missing(self, key, indexer, axis_name)
   6130     raise KeyError(f"None of [{key}] are in the [{axis_name}]")
   6132 not_found = list(ensure_index(key)[missing_mask.nonzero()[0]].unique())
-> 6133 raise KeyError(f"{not_found} not in index")

KeyError: '[16385, 16384, 16394, 16397, 16421, 16424, 16425, 16427, 16423, 16430, 16431, 16432, 16433, 16434, 16439, 16444, 16445, 16447, 16468, 16488, 16494, 16498, 16506, 16525, 16530, 16532, 16541, 16543, 16544, 16547, 16550, 16570, 16572, 16577, 16589, 16600, 16615, 16622, 16623, 16629, 16631, 16634, 16636, 16644, 16650, 16670, 16674, 16676, 16691, 16700, 16708, 16711, 16719, 16728, 16741, 16743, 16751, 16773, 16798, 16803, 16805, 16816, 16824, 16825, 16853, 16892, 16893, 16927, 16930, 16940, 16939, 16955, 16963, 16983, 16984, 17005, 17014, 17019, 17050, 17053, 17052, 17075, 17083, 17092, 17095, 17090, 17115, 17121, 17132, 17134, 17142, 17141, 17192, 17200, 17206, 17223, 17235, 17256, 17266, 17267, 17270, 17271, 17274, 17308, 17312, 17314, 17315, 17335, 17339, 17342, 17344, 17343, 17347, 17385, 17387, 17409, 17417, 17418, 17456, 17503, 17544, 17566, 17570, 17572, 17587, 17612, 17626, 17649, 17664, 17696, 17699, 17712, 17739, 17741, 17758, 17759, 17770, 17787, 17789, 17793, 17802, 17803, 17823, 17824, 17835, 17844, 17857, 17887, 17922, 17923, 17933, 17935, 17932, 17950, 17951, 17962, 17964, 17985, 17989, 18018, 18024, 18026, 18059, 18097, 18125, 18144, 18147, 18155, 18164, 18175, 18178, 18180, 18200, 18205, 18213, 18224, 18233, 18253, 18263, 18270, 18276, 18283, 18301, 18347, 18356, 18362, 18365, 18367, 18370, 18374, 18380, 18390, 18395, 18406, 18409, 18423, 18434, 18437, 18439, 18444, 18449, 18451, 18454, 18489, 18491, 18517, 18518, 18538, 18548, 18556, 18566, 18568, 18571, 18575, 18576, 18577, 18588, 18591, 18595, 18621, 18659, 18660, 18677, 18684, 18685, 18687, 18688, 18690, 18714, 18721, 18725, 18741, 18787, 18791, 18831, 18849, 18865, 18891, 18896, 18914, 18974, 18988, 18990, 18998, 19034, 19040, 19065, 19072, 19076, 19095, 19128, 19139, 19161, 19171, 19178, 19189, 15097, 19195, 15102, 15101, 19201, 15112, 15120, 15122, 15123, 19223, 15131, 19249, 15250, 15262, 15275, 15312, 15324, 15330, 15338, 15343, 15392, 15397, 15401, 15407, 15426, 15431, 15434, 15437, 15455, 15464, 15468, 15480, 15498, 15500, 15521, 15539, 15554, 15555, 15571, 15585, 15592, 15595, 15605, 15607, 15610, 15631, 15636, 15672, 15674, 15677, 15693, 15702, 15718, 15724, 15727, 15728, 15732, 15759, 15762, 15766, 15769, 15773, 15776, 15777, 15779, 15805, 15812, 15813, 15827, 15835, 15853, 15854, 15855, 15876, 15878, 15881, 15883, 15885, 15887, 15892, 15897, 15899, 15901, 15902, 15905, 15907, 15909, 15908, 15911, 15922, 15930, 15931, 15937, 15939, 15941, 15942, 15940, 15945, 15946, 15948, 15949, 15960, 15961, 15970, 15974, 15985, 16004, 16007, 16010, 16023, 16043, 16081, 16082, 16100, 16107, 16109, 16120, 16125, 16131, 16133, 16135, 16142, 16146, 16158, 16167, 16170, 16169, 16180, 16179, 16186, 16198, 16199, 16202, 16207, 16223, 16234, 16235, 16236, 16242, 16246, 16266, 16269, 16271, 16322, 16324, 16335, 16340, 16351, 16373] not in index'
Zhenbin24 commented 1 year ago

This problem has been solved in another issue(https://github.com/STOmics/Stereopy/issues/140), and we will release an update in the next version. Hope my answer will be helpful!

fumi-github commented 1 year ago

Thank you for your prompt reply. It really helps. Is that in the branch zhenbin ? Specifically the commit 3b263f291566f86c16283a4a8771041e71ca7a45 ?

fumi-github commented 1 year ago

The commit did resolve my error. Thank you very much!