dpeerlab / Palantir

Single cell trajectory detection
https://palantir.readthedocs.io
GNU General Public License v2.0
203 stars 45 forks source link

cell cycle correction #28

Closed ShobiStassen closed 4 years ago

ShobiStassen commented 4 years ago

hi,

Firstly, Palantir is a really great tool. thanks for making it! I was wondering if the input to Palantir is the cell-cycle corrected data? Also, in the comparison of Palantir to other methods in your jupiter notebooks, does the input anndata file of Human CD34+ bone marrow cells (where X is filtered, normalized matrix) include cell cycle correction. If palantir does use cell cycle corrected data, how much does this impact the results?

also, on a separate note, for some reason I cannot get Palantir to run when k (in the NN-graph) is around 10-15knn. Palantir completes the 'run' even as low as knn=5, but something happens in the knn= 10-15 range. Is this something you have experienced? I can share some error messages after I test it out a bit more myself. Thanks a ton Shobi

ManuSetty commented 4 years ago

Hello - Glad you find the tool useful!

The issue of cell cycle is a function of the dataset you are using. If you believe cell cycle does significantly impact your data, we do recommend corrected the cell cycle effects before using Palantir.

The Anndata of Human CD34+ bone marrow cells is prior to cell cycle correction. In the manuscript, we use fscLVM to correct for the effects and found that correcting for cell cycle helped us align the start and precursors better.

Wrt kNN, this is a bit unusual - could you please share the error if you are still facing the issue.

ShobiStassen commented 4 years ago

Hello, Thanks for the explanation. wrt kNN: about 75% of the time, when I try to use k=10, i get an error like this: I haven't looked into it yet, but seems more like a 'book keeping' problem than anything very serious [5780 rows x 14651 columns] Determing nearest neighbor graph... ms data (5780, 148) Sampling and flocking waypoints... Time for determining waypoints: 0.003299887975056966 minutes Determining pseudotime... Shortest path distances using 10-nearest neighbor graph... Pseudtime Function multi-scale input, KNN has 1 components after connecting disconnected. adj 1 components Time for shortest paths: 0.10192433198293051 minutes Iteratively refining the pseudotime... Correlation at iteration 1: 0.9795 Correlation at iteration 2: 0.9997 Correlation at iteration 3: 1.0000 Entropy and branch probabilities... Markov chain construction... Markov Construction using multi-scale input, KNN has 1 components Identification of terminal states... identified ['c1459', 'c1003', 'c997', 'c1947', 'c4197', 'c951', 'c2157', 'c951', 'c4168', 'c4117', 'c3613', 'c2041', 'c2463'] terminal states Computing fundamental matrix and absorption probabilities... /home/shobi/anaconda3/envs/ViaEnv/lib/python3.7/site-packages/pandas/core/frame.py:7123: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.

sort=sort, Traceback (most recent call last): File "/home/shobi/PycharmProjects/Via/Viav018.py", line 2898, in main() File "/home/shobi/PycharmProjects/Via/Viav018.py", line 2550, in main pr_res = palantir.core.run_palantir(ms_data, early_cell=start_cell, num_waypoints=1200, knn=knn) File "/home/shobi/anaconda3/envs/ViaEnv/lib/python3.7/site-packages/palantir-0.2.2-py3.7.egg/palantir/core.py", line 86, in run_palantir knn, n_jobs, pseudotime) File "/home/shobi/anaconda3/envs/ViaEnv/lib/python3.7/site-packages/palantir-0.2.2-py3.7.egg/palantir/core.py", line 406, in _differentiation_entropy branch_probs = branch_probs.append(bp.loc[:, branch_probs.columns]) File "/home/shobi/anaconda3/envs/ViaEnv/lib/python3.7/site-packages/pandas/core/frame.py", line 7123, in append sort=sort, File "/home/shobi/anaconda3/envs/ViaEnv/lib/python3.7/site-packages/pandas/core/reshape/concat.py", line 258, in concat return op.get_result() File "/home/shobi/anaconda3/envs/ViaEnv/lib/python3.7/site-packages/pandas/core/reshape/concat.py", line 473, in get_result mgrs_indexers, self.new_axes, concat_axis=self.axis, copy=self.copy File "/home/shobi/anaconda3/envs/ViaEnv/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 2038, in concatenate_block_managers for placement, join_units in concat_plan: File "/home/shobi/anaconda3/envs/ViaEnv/lib/python3.7/site-packages/pandas/core/internals/concat.py", line 474, in combine_concat_plans raise ValueError("Plan shapes are not aligned") ValueError: Plan shapes are not aligned

Process finished with exit code 1

ShobiStassen commented 4 years ago

Oh, and I should mention that I also try to set the num_components in run_pca and run_diff_maps. I allow the ms_data to be computed based on the eigengap as automatically done by palantir

pca_projections, _ = palantir.utils.run_pca(norm_df_pal, n_components=ncomps)
sc.tl.pca(ad, svd_solver='arpack')
dm_res = palantir.utils.run_diffusion_maps(pca_projections, n_components=ncomps, knn=knn)

ms_data = palantir.utils.determine_multiscale_space(dm_res)  # n_eigs is determined using eigengap

pr_res = palantir.core.run_palantir(ms_data, early_cell=start_cell, num_waypoints=1200, knn=knn)
palantir.plot.plot_palantir_results(pr_res, tsne)
ManuSetty commented 4 years ago

Sorry for the late response on this. I unfortunately cannot reproduce this error on my side. Do you by any chance have duplicate cell names? If not, do you mind anonymizing the data and sending the count matrix to us to debug?

ShobiStassen commented 4 years ago

hi again, i run into this issue when trying to follow the Notebook on the example Human CD34 Replicate 1 data

This is the code I run (sometimes it works, other times it fails and produces the error code - see end of message)

import palantir
import scanpy as sc
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

ncomps = 50
knn=10

ad = sc.read( '/home/shobi/Trajectory/Datasets/HumanCD34/human_cd34_bm_rep1.h5ad')

ad.uns['iroot'] = np.flatnonzero(ad.obs_names == ad.obs['palantir_pseudotime'].idxmin())[0]
print('iroot', np.flatnonzero(ad.obs_names == ad.obs['palantir_pseudotime'].idxmin())[0])

norm_df_pal = pd.DataFrame(ad.X)

new = ['c' + str(i) for i in norm_df_pal.index]
norm_df_pal.index = new

tsne = pd.DataFrame(ad.obsm['tsne'], index=ad.obs_names, columns=['x', 'y'])
tsne.index = new

pca_projections, _ = palantir.utils.run_pca(norm_df_pal, n_components=ncomps)

sc.tl.pca(ad, svd_solver='arpack', n_comps=ncomps)
dm_res = palantir.utils.run_diffusion_maps(pca_projections, n_components=ncomps, knn=knn)

ms_data = palantir.utils.determine_multiscale_space(dm_res)  # n_eigs is determined using eigengap

pr_res = palantir.core.run_palantir(ms_data, early_cell='c4823', num_waypoints=1200, knn=knn)
palantir.plot.plot_palantir_results(pr_res, tsne)
plt.show()

Error:

/home/shobi/anaconda3/envs/ViaEnv/bin/python /home/shobi/PycharmProjects/Via/testing_apr27.py
iroot 4823
Determing nearest neighbor graph...
Sampling and flocking waypoints...
Time for determining waypoints: 0.007724754015604655 minutes
Determining pseudotime...
Shortest path distances using 10-nearest neighbor graph...
Pseudtime Function multi-scale input, KNN has 1 components
after connecting disconnected. adj 1 components
Time for shortest paths: 0.0793550173441569 minutes
Iteratively refining the pseudotime...
Correlation at iteration 1: 0.9994
Correlation at iteration 2: 1.0000
Entropy and branch probabilities...
Markov chain construction...
Markov Construction using multi-scale input, KNN has 1 components
Identification of terminal states...
identified ['c4937', 'c85', 'c1101', 'c4583', 'c3688', 'c4832', 'c2455', 'c2156', 'c261', 'c2', 'c85', 'c2156', 'c4583', 'c261'] terminal states
Computing fundamental matrix and absorption probabilities...
/home/shobi/anaconda3/envs/ViaEnv/lib/python3.7/site-packages/pandas/core/frame.py:7123: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.

  sort=sort,
Traceback (most recent call last):
  File "/home/shobi/PycharmProjects/Via/testing_apr27.py", line 30, in <module>
    pr_res = palantir.core.run_palantir(ms_data, early_cell='c4823', num_waypoints=1200, knn=knn)
  File "/home/shobi/anaconda3/envs/ViaEnv/lib/python3.7/site-packages/palantir-0.2.2-py3.7.egg/palantir/core.py", line 86, in run_palantir
    knn, n_jobs, pseudotime)
  File "/home/shobi/anaconda3/envs/ViaEnv/lib/python3.7/site-packages/palantir-0.2.2-py3.7.egg/palantir/core.py", line 406, in _differentiation_entropy
    branch_probs = branch_probs.append(bp.loc[:, branch_probs.columns])
  File "/home/shobi/anaconda3/envs/ViaEnv/lib/python3.7/site-packages/pandas/core/frame.py", line 7123, in append
    sort=sort,
  File "/home/shobi/anaconda3/envs/ViaEnv/lib/python3.7/site-packages/pandas/core/reshape/concat.py", line 258, in concat
    return op.get_result()
  File "/home/shobi/anaconda3/envs/ViaEnv/lib/python3.7/site-packages/pandas/core/reshape/concat.py", line 473, in get_result
    mgrs_indexers, self.new_axes, concat_axis=self.axis, copy=self.copy
  File "/home/shobi/anaconda3/envs/ViaEnv/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 2038, in concatenate_block_managers
    for placement, join_units in concat_plan:
  File "/home/shobi/anaconda3/envs/ViaEnv/lib/python3.7/site-packages/pandas/core/internals/concat.py", line 474, in combine_concat_plans
    raise ValueError("Plan shapes are not aligned")
ValueError: Plan shapes are not aligned

Process finished with exit code 1
ManuSetty commented 4 years ago

Thanks for the update. We will get back shortly!

awnimo commented 4 years ago

@ShobiStassen Please update your Palantir to the latest release here, and let us know if you have any further issues.

Thank you for reporting this.