scf.tl.pseudotime error

pchiang5 commented 1 year ago

Hello,

Thank you for the tool.

When I tried to convert a cellrank object by your protocol. The initial steps ran without problems

scf.tl.cellrank_to_tree(WA,time="latent_time",Nodes=500,seed=1, ptt_steps = 100) 
scf.pl.graph(WA)
scf.tl.root(WA,301)

However, the pseudotime step showed the NAN error:

In [90]: scf.tl.pseudotime(WA,n_jobs=20,n_map=100,seed=42) projecting cells onto the principal graph mappings: 100%|███████████████████████████████████████████████████████████████████| 100/100 [01:19<00:00, 1.26it/s]

ValueError Traceback (most recent call last) Cell In[90], line 1 ----> 1 scf.tl.pseudotime(WA,n_jobs=20,n_map=100,seed=42)

File /home/pc/miniconda3/envs/scvi/lib/python3.10/site-packages/scFates/tools/pseudotime.py:232, in pseudotime(adata, n_jobs, n_map, seed, copy) 225 milestones[ 226 cell_seg.index[ 227 (cell_seg - min(cell_seg) - (max(cell_seg - min(cell_seg)) / 2) > 0) 228 ] 229 ] = pp_seg.loc[int(seg), "to"] 230 adata.obs["milestones"] = milestones 231 adata.obs.milestones = ( --> 232 adata.obs.milestones.astype(int).astype("str").astype("category") 233 ) 235 adata.uns["graph"]["milestones"] = dict( 236 zip( 237 adata.obs.milestones.cat.categories, 238 adata.obs.milestones.cat.categories.astype(int), 239 ) 240 ) 242 # setting consistent color palettes

File ~/.local/lib/python3.10/site-packages/pandas/core/generic.py:5815, in NDFrame.astype(self, dtype, copy, errors) 5808 results = [ 5809 self.iloc[:, i].astype(dtype, copy=copy) 5810 for i in range(len(self.columns)) 5811 ] 5813 else: 5814 # else, only a single dtype is given -> 5815 new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors) 5816 return self._constructor(new_data).finalize(self, method="astype") 5818 # GH 33113: handle empty frame or series

File ~/.local/lib/python3.10/site-packages/pandas/core/internals/managers.py:418, in BaseBlockManager.astype(self, dtype, copy, errors) 417 def astype(self: T, dtype, copy: bool = False, errors: str = "raise") -> T: --> 418 return self.apply("astype", dtype=dtype, copy=copy, errors=errors)

File ~/.local/lib/python3.10/site-packages/pandas/core/internals/managers.py:327, in BaseBlockManager.apply(self, f, align_keys, ignore_failures, kwargs) 325 applied = b.apply(f, kwargs) 326 else: --> 327 applied = getattr(b, f)(**kwargs) 328 except (TypeError, NotImplementedError): 329 if not ignore_failures:

File ~/.local/lib/python3.10/site-packages/pandas/core/internals/blocks.py:592, in Block.astype(self, dtype, copy, errors) 574 """ 575 Coerce to the new dtype. 576 (...) 588 Block 589 """ 590 values = self.values --> 592 new_values = astype_array_safe(values, dtype, copy=copy, errors=errors) 594 new_values = maybe_coerce_values(new_values) 595 newb = self.make_block(new_values)

File ~/.local/lib/python3.10/site-packages/pandas/core/dtypes/cast.py:1309, in astype_array_safe(values, dtype, copy, errors) 1306 dtype = pandas_dtype(dtype) 1308 try: -> 1309 new_values = astype_array(values, dtype, copy=copy) 1310 except (ValueError, TypeError): 1311 # e.g. astype_nansafe can fail on object-dtype of strings 1312 # trying to convert to float 1313 if errors == "ignore":

File ~/.local/lib/python3.10/site-packages/pandas/core/dtypes/cast.py:1257, in astype_array(values, dtype, copy) 1254 values = values.astype(dtype, copy=copy) 1256 else: -> 1257 values = astype_nansafe(values, dtype, copy=copy) 1259 # in pandas we don't store numpy str dtypes, so convert to object 1260 if isinstance(dtype, np.dtype) and issubclass(values.dtype.type, str):

File ~/.local/lib/python3.10/site-packages/pandas/core/dtypes/cast.py:1174, in astype_nansafe(arr, dtype, copy, skipna) 1170 elif is_object_dtype(arr): 1171 1172 # work around NumPy brokenness, #1987 1173 if np.issubdtype(dtype.type, np.integer): -> 1174 return lib.astype_intsafe(arr, dtype) 1176 # if we have a datetime/timedelta array of objects 1177 # then coerce to a proper dtype and recall astype_nansafe 1179 elif is_datetime64_dtype(dtype):

File ~/.local/lib/python3.10/site-packages/pandas/_libs/lib.pyx:679, in pandas._libs.lib.astype_intsafe()

ValueError: cannot convert float NaN to integer

After looking into the files, I found the seg assignment in 'pp_info' was all 0s and the milestones were all NAN. The pancreas dataset on your website could run without any issues.

Could you help resolve the issue? Thanks again.

Out[89]: {'B': array([[0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], ..., [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0]]), 'F': array([[ 0.5954733288, -0.0146302073, 0.1622698497, ..., -0.185073663 , -0.0153151324, -0.7113653811], [ 0.1221112134, -0.7174254793, 0.4631041318, ..., -0.0173679111, -0.7060736924, -0.0547345371], [ 0.6855247876, 0.7496417954, 0.4089581897, ..., 0.3097938614, 0.738704421 , 0.6263444361]]), 'tips': array([197, 261, 271, 324]), 'forks': array([116, 345]), 'metrics': 'euclidean', 'use_rep': 'X_fates', 'ndims_rep': None, 'method': 'ppt', 'pp_info': PP time seg 0 0 0.836408 0 1 1 1.103287 0 2 2 0.424225 0 3 3 0.836513 0 4 4 0.536580 0 .. ... ... .. 495 495 0.165975 0 496 496 0.246738 0 497 497 0.245144 0 498 498 1.087492 0 499 499 0.866917 0

[500 rows x 3 columns], 'pp_seg': n from to d 1 1 116 261 0.868953 2 2 301 116 0.247469 3 3 116 324 0.636655 4 4 345 197 0.587749 5 5 345 271 0.562103 6 6 301 345 0.274438, 'root': 301}

LouisFaure commented 1 year ago

Hi, I would explore these three questions to figure out what's going on:

Does the tree displayed by pl.graph makes sense?
Does pp_info show 0 assigned segs after running tl.root and before running tl.pseudotime ?
Do you still see this error if you run tl.pseudotime with only one mapping (parameter n_map=1)?

pchiang5 commented 1 year ago

Hello

yes. the plot makes sense.
yes the 0 occurred before running the tl.psuedotime.
yes the same error showed with n_map =1

I found the problem was due to cr.tl.lineages(WA, backward= False)of the cellrank package.

In [33]: cr.tl.lineages(WA, backward= False)
:1: DeprecationWarning: `cellrank.tl.lineages` will be removed in version `2.0`. Please use the `cellrank.kernels` or `cellrank.estimators` interface instead. cr.tl.lineages(WA, backward= False) 100%|███████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 7.70/s] [0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Caught signal number 13 Broken Pipe: Likely while reading or writing to a socket [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/ [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run [0]PETSC ERROR: to get more information on the crash. application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=59 : system msg for write_line failure : Bad file descriptor

When I switch from ipython to python, the error was done and the seg showed correctly. Maybe a piece of information failed to add on the WA due to the error.

Millions of thanks for your prompt response!

WA.uns['graph']['pp_info'] PP time seg 0 0 1.598479 3 1 1 1.592943 2 2 2 1.187386 3 3 3 1.598616 3 4 4 0.375064 1 .. ... ... .. 295 295 0.007861 1 296 296 1.598503 3 297 297 1.332531 2 298 298 1.592014 3 299 299 0.662050 3

[300 rows x 3 columns]

LouisFaure / scFates

scf.tl.pseudotime error #14