dpeerlab / Palantir

Single cell trajectory detection
https://palantir.readthedocs.io
GNU General Public License v2.0
213 stars 50 forks source link

d = sce.tl.palantir(adata) returns "None" #34

Closed zmokhtari closed 4 years ago

zmokhtari commented 4 years ago

Hi, It's me again :) Thanks for providing an amazing tool for single-cell data analysis. I was trying to use palantir in scanpy and I just reran the example data. However, d = sce.tl.palantir(adata) always return None for any kind of data. Could you please comment on this? I am using scanpy 1.5 and have updated Palantir to 0.2.6 Thanks in advance, Zeinab

awnimo commented 4 years ago

We are in the process of updating the documentation on Scanpy's wrapper for Palantir. In the meanwhile, use sce.tl.palantir(adata) that will generate outputs stored directly in adata. For visualization, I recommend using Palantir methods for plotting directly from Palantir. Stay tuned for the new updated wrapper into Scanpy to be PR'd soon.

zmokhtari commented 4 years ago

Thank you for your fast feedback. Now, I face another issue about the number of components in which I can not set it to another number in "sce.tl.palantir(adata)" ValueError: n_components=300 must be between 1 and min(n_samples, n_features)=27 with svd_solver='randomized'

zmokhtari commented 4 years ago

And the main reason that I moved to scanpy is the performance speed

awnimo commented 4 years ago

I'm sorry for the inconvenience you're having. The new updated wrapper of Palantir in Scanpy will expose many parameters including n_components. Once I have a working wrapper I will share it with you promptly. This is in the making and should be ready very soon.

zmokhtari commented 4 years ago

I appreciate it.

awnimo commented 4 years ago

@zmokhtari A new Scanpy wrapper was just PR'd. To get you moving, you can clone the branch update_palantir_external, checkout to update_palantir_external, and pip install ., to have access to the new feature. Please let us know should have any issues. The DocString includes an example how to use the integrated method, that is all different from the old one.

zmokhtari commented 4 years ago

Thanks for the improvement, is it possible to turn on and off Magic imputation? I faced memory error: MemoryError Traceback (most recent call last)

in ----> 1 sce.tl.palantir(adata, n_components=5, knn=26) c:\users\user\documents\scanpy\scanpy\external\tl\_palantir.py in palantir(adata, n_components, knn, alpha, use_adjacency_matrix, distances_key, n_eigs, n_steps, copy) 211 212 # MAGIC imputation --> 213 imp_df = run_magic_imputation(data=adata.to_df(), dm_res=dm_res, n_steps=n_steps) 214 215 ( ~\Anaconda3\envs\Scanpy_dev\lib\site-packages\palantir\utils.py in run_magic_imputation(data, dm_res, n_steps) 110 T_steps = dm_res["T"] ** n_steps 111 imputed_data = pd.DataFrame( --> 112 np.dot(T_steps.todense(), data), index=data.index, columns=data.columns 113 ) 114 ~\AppData\Roaming\Python\Python37\site-packages\scipy\sparse\base.py in todense(self, order, out) 849 `numpy.matrix` object that shares the same memory. 850 """ --> 851 return asmatrix(self.toarray(order=order, out=out)) 852 853 def toarray(self, order=None, out=None): ~\AppData\Roaming\Python\Python37\site-packages\scipy\sparse\compressed.py in toarray(self, order, out) 1023 if out is None and order is None: 1024 order = self._swap('cf')[0] -> 1025 out = self._process_toarray_args(order, out) 1026 if not (out.flags.c_contiguous or out.flags.f_contiguous): 1027 raise ValueError('Output array must be C or F contiguous') ~\AppData\Roaming\Python\Python37\site-packages\scipy\sparse\base.py in _process_toarray_args(self, order, out) 1187 return out 1188 else: -> 1189 return np.zeros(self.shape, dtype=self.dtype, order=order) 1190 1191 MemoryError: Unable to allocate 118. GiB for an array with shape (126007, 126007) and data type float64
awnimo commented 4 years ago

MAGIC imputation made optional, and can be turned off by setting impute_data=False

zmokhtari commented 4 years ago

Thank you very much. May I ask what should be replaced instead of tsne.loc in the following command line: mapping['DC'] = tsne.loc[pr_res.branch_probs.columns, 'x'].idxmax()

awnimo commented 4 years ago

First you have to build tsne on diffusion maps which is critical for visualizing Palntir results on tsne:

sc.tl.tsne(adata, n_pcs=2, use_rep='palantir_multiscale', perplexity=150)

you must use the parameter use_rep='palantir_multiscale'.

Then, you can assign your tsne results:

import pandas as pd
tsne = pd.DataFrame(adata.obsm['X_tsne'], columns=['x', 'y'], index=adata.obs_names)
zmokhtari commented 4 years ago

Many thanks for your help!